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Foreword 



SOFSEM 2001, the International Conference on Current Trends in Theory and 
Practice of Informatics, was held on November 24 - December 1, 2001 in the 
well-known spa Piestany, Slovak Republic. This was the 28th annual conference 
in the SOFSEM series organized either in the Slovak or the Czech Republic. 

SOFSEM has a well-established tradition. Currently it is a broad, multidis- 
ciplinary conference, devoted to the theory and practice of software systems. Its 
aim is to foster cooperation among professionals from academia and industry 
working in various areas of informatics. 

The scientific program of SOFSEM consists of invited talks, which determine 
the topics of the conference, and short contributed talks presenting original re- 
sults. The topics of the invited talks are chosen so as to cover the whole range 
from theory to practice and to bring interesting research areas to the attention 
of conference participants. For the year 2001, the following three directions were 
chosen for presentation by the SOFSEM Steering Committee: 

— Trends in Informatics 

— Enabling Technologies for Global Computing 

— Practical Systems Engineering and Applications 

The above directions were covered through 12 invited talks presented by promi- 
nent researchers. There were 18 contributed talks, selected by the international 
Program Committee from among 46 submitted papers. The conference was also 
accompanied by workshops on Electronic Commerce Systems (coordinated by 
H. D. Zimmermann) and Soft Computing (coordinated by P. Hajek). 

The present volume contains invited papers (including the keynote talk given 
by Vaughan R. Pratt), the Soft Computing workshop opening plenary talk (pre- 
sented by Sandor Jenei), and all the contributed papers. 

We are grateful to the members of both the SOFSEM Advisory Board and 
the SOFSEM Steering Committee for their proposals for the conference scientific 
program and for their cooperation in contacting the invited speakers. We also 
wish to thank everybody who submitted a paper for consideration, all Program 
Committee members for their meritorious work in evaluating the submitted pa- 
pers, as well as all subreferees who assisted the Program Committee members 
in the evaluation process. We are deeply indebted to the authors of invited and 
contributed papers who prepared their manuscripts for presentation in this vol- 
ume. Our special thanks goes to Miroslav Chladny who designed and managed 
electronic support for the Program Committee and who did most of the hard 
technical work in preparing this volume. We are also thankful to the Organiz- 
ing Committee team lead by Igor Privara, who made sure that the conference 
ran smoothly in a pleasant environment. Last but not least we would like to 
thank Springer- Verlag for their excellent cooperation during the publication of 
this volume. 
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Leszek Pacholski, Peter Ruzicka 
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The Potential of Grid, Virtual Laboratories 
and Virtual Organizations for Bio-sciences 



Hamideh Afsarmanesh, Ersin Kaletas, and Louis O. Hertzberger 

University of Amsterdam, Informatics Institute, 

Kruislaan 403, 1098 SJ Amsterdam, The Netherlands 
{hamideh, kaletas, bob}@science . uva . nl 



Abstract. VLAM-G, the Grid-based Virtual Laboratory AMsterdam, provides 
a science portal for distributed analysis in applied scientific research. It offers 
scientists the possibility to carry out their experiments in a familiar environ- 
ment that provides seamless access to geographically distributed resources and 
devices. In this paper, the general design of the VLAM-G platform is intro- 
duced. Furthermore, the application of the VLAM-G and its extension with 
Virtual Organization concepts for specific scientific domains is presented, with 
focus on bio-sciences. 



1 The Grid-Based Virtual Laboratory in Amsterdam 

Due to the systematic growth of research efforts iu experimeutal scieuces, vast 
amouuts of data/iuformatiou are gathered at scattered data resources all arouud the 
world, that also ueed to be accessed by mauy geographically-distributed eud-users. 
This sceuario clearly requires shared access to resources available through global 
distributed computer facilities. Iu additiou, advauced fuuctioualities are required iu 
order to allow researchers to couduct high-level scieutific experimeutatiou ou top of 
such a distributed computer system. These advauced fuuctioualities iuclude for iu- 
stauce: distributed iuformatiou mauagemeut, data aud iuformatiou disclosure, visuali- 
zatiou, etc. Modem advauces iu the IT area such as the Grid [I], [2] aud uew ap- 
proaches such as Virtual Laboratories (VL) [3] cau be properly applied here. 

Iu this coutext, the ICES/KIS-II project VLAM-G [4] of the Uuiversity of Amster- 
dam (UvA) aims at the desigu aud developmeut of au opeu, flexible, scalable, aud 
coufigurable framework providiug uecessary Grid-based hardware aud software eua- 
bliug scieutists aud eugiueers iu differeut areas of research to work ou their problems 
via experimeutatiou, while makiug optimum use of the modem Iuformatiou Techuol- 
ogy. The VLAM-G provides a distributed high-performauce computiug aud commu- 
uicatiou iuffastmcture with advauced iuformatiou mauagemeut fuuctioualities, ad- 
dressiug iu specific the experimeutatiou requiremeuts iu, amoug others, the scieutific 
domaius of biology, physics, aud systems eugiueeriug. As such, access to physically 
distributed data aud processes amoug mauy sites iu the virtual laboratory, uecessary 
for the achievemeut of complex experimeutatious, is totally trauspareut to the scieu- 
tist, giviug them the image of workiug iu a siugle physical laboratory. 
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Considering these ideas, the global reference scenario for the VLAM-G is depicted 
in Fig. 1. Namely, the Virtual Laboratory is regarded as a Grid-based software layer 
on top which applications coming from different scientific domains can be developed. 
For instance, this figure depicts the following specific applications: 

• “Materials Analysis of Complex Surfaces” (MACS) from the chemistry and phys- 
ics domain. Here, the use of MACS methods within VLAM-G to study the proper- 
ties of material surfaces has proven to be a powerful tool, not only for fundamental 
but also for applied research in fields as diverse as art conservation, cancer therapy 
and mass spectrometry [5]. 

• The DNA-Array application from the bio-informatics domain. In this case, the 
VLAM-G supports the integration of Micro-array technology to enable the study 
of the characteristics of thousands of genes in a single experiment [6]. 

• The Radiology application, for which advanced visualization and interactive simu- 
lation techniques are required. Namely, within the VLAM-G environment, a Vir- 
tual Radiology Explorer is demonstrated based on the virtual reality environment 
and database facilities that are integrated by the Virtual Laboratory layer [7]. 

As mentioned previously, all these applications are supported by the Virtual Labo- 
ratory layer based on the Grid infrastructure. Namely, considering the performance 
requirements of these application scenarios in terms of the amount of data being ex- 
changed and the heavy computational processes that need to be executed, the use of a 
high-performance distributed resource management architecture, such as the Data 




Fig. 1. The VLAM-G global reference scenario 
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Grid, is mandatory within the Virtual Laboratory infrastructure. The main objectives 
of the Data Grid are to integrate huge heterogeneous data archives into a distributed 
data management Grid, and to identify services for high-performance, distributed, 
data intensive computing. As such, this infrastructure offers a wide variety of high- 
performance services that are well exploited by the VLAM-G infrastructure compo- 
nents. 

Furthermore, an architectural overview of the VLAM-G layer itself is given in Fig. 
2. Thus, the VLAM-G represents a modular architecture composed of the following 
tiers. The application tier, in which the aforementioned scientific applications are 
provided. The application toolkits tier, including the web science portals and internal 
core components, such as: interactive visualization and simulation (VISE), communi- 
cation and collaboration (COMCOL), and VL information management for coopera- 
tion (VIMCO) components. The Grid middleware tier, providing the Grid services 
and the Grid resources tier for the access to the underlying physical/logical distributed 
resources. 

Within the application toolkit tier, the Web-based portal and workbench interface, 
together with the modular design of the VLAM-G architecture, provide a uniform 
environment for all experiments, and makes it possible to attach a wide range of 
software tools to the Virtual Laboratory. This ranges from basic tools such as simula- 
tion, visualization, data storage / manipulation to advanced facilities like: remote 
controlling of devices, visualization in a virtual reality environment and federated 
advanced information management. Modularity is the key to scalability and openness, 
and also to support the inter-disciplinary research. In this way, VLAM-G solves many 
technical problems that scientists face, hence enabling them to focus better on their 
experiments, and simultaneously it reduces the costs of experimentation by sharing 
the expensive resources among them. 

The VIMCO component being developed by the CO-IM (Co-Operative Informa- 
tion Management) group of the UvA, supports advanced information management 
requirements of the VLAM-G applications, addressing many of the obstacles de- 
scribed earlier. 



2 Data and Information Management in Biology 

New generation of automated high-throughput experimental technologies has paved 
the way to sequence millions of base pairs a day, run thousands of assays for testing 
chemical compounds, and monitor the expression profiles of thousands of genes in a 
single experiment; leading to exponential growth in size of genome databases. At the 
present rate of growth, the SWISS-PROT database for instance, will double in size 
every 40 months and the nucleotide databases will double in size about every 14 
months [8]. During the year 2000, every day over two million bases were deposited 
into GenBank, and the bio-informatics companies have already generated terabytes of 
genomic data [9]. 
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Fig. 2. VLAM-G architecture overview 



At the same time, considering different areas within biosciences, research has re- 
sulted a large number and range of “heterogeneous” genome databases. The hetero- 
geneity is partially due to: (1) the wide variety of types of genomic information, (2) 
various representations/formats designed for the stored data, and (3) different data 
storage organization used to store this information. In the latest compilation of key 
high-quality biological database resources of value, available all around the world 
[10], 281 databases are listed in 18 different categories of biological content. Every 
database is provided and supported by an independent and “autonomous” center. Out 
of these databases, 55 were added only since the previous compilation, reported in 
January 2000. These data resources contain information ranging from sequence to 
pathology, and gene expression to pathway aspects of the biosciences. Clearly, di- 
verse information is stored using different data storage organizations by different 
centers. Many data resources store data in flat fdes, while each defining its own spe- 
cific data structure and format to represent the data. Others use database management 
systems (DBMS) to store data, but even if the same DBMS is used by two centers, 
there are enormous differences in data representation (data models/data formats) 
designed by each center. Furthermore, querying capabilities and interfaces to retrieve 
data change from one resource to another. The heterogeneity problem grows further 
with the relatively new types of information, such as for instance the gene expression 
data, where even the consensus on defining the required minimum information set to 
be stored is still under study [11]. 
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Data provided so far through the database resources reflect that: human genome is 
mostly sequenced, several metabolic pathways are identified, expression profiles have 
been generated for many genes, etc., that in turn opens the gates to the possibility of 
new discoveries and advanced researches, through the “extraction of knowledge” 
from the scattered collected data. Achieving advanced extraction of knowledge, per- 
formed either by individual scientists or in collaboration with other individuals or 
centers, primarily constitutes a “large experimentation process” with many steps. 
Namely, several distributed, heterogeneous, and autonomous sites need to be inter- 
linked, and perhaps co-work towards the achievement of a common goal. Such an 
involved process both requires and benefits from some assisting “enhancement envi- 
ronments”, that can facilitate the variety of needed functionalities and can provide 
models, mechanisms, and tools enabling scientists and centers with this complex and 
relatively long experimentation process. 

From the information management point of view, such experimentation process 
involves many steps including for example: access to distributed data resources from 
different public and private (autonomous) centers, finding the relevant pieces of data, 
bringing these pieces of data together, understanding what each heterogeneous piece 
means and how it can relate to the other pieces, inter-link the relevant data, then after 
any processing/ analyses store the results together with some description of the proc- 
esses involved in the performed experiment for the sake of future reference to this 
experiment, and furthermore sharing some results with certain other collaborators. As 
a first base to support advanced experimentation, a comprehensive integrated/unified 
meta-data model (schema) must be created representing the variety of accessed in- 
formation. This is a challenging task, especially when the experiment either covers 
several distinct areas within the biosciences (e.g. gene sequences, expressions, and 
pathways), or even more so when it is a multi-disciplinary scientific experiment (e.g. 
the proteomics research involving the fields of biology as well as physics and chemis- 
try). To fully support advanced experiments, many information management prob- 
lems need to be addressed, including for example handling: storage of very large 
scientific data sets in databases [12], incomplete and inconsistent data[13], heteroge- 
neity in the used terminology in meta-data[14], semantic heterogeneity in the defined 
data models, syntactic heterogeneity in the data representation/ formatting, preserva- 
tion of site autonomy, and the interoperation and security issues among collaborating 
centers. Solutions addressing some of these challenges have been proposed [15, 16], 
however many important issues such as for instance, the secure sharing of some pro- 
prietary information with only certain authorized users while protecting it against 
others, or the design of flexible / dynamic mechanisms for autonomous sites to enable 
them to define information visibility levels and access rights for other sites, at fine 
granularity [17] are not yet properly addressed. 

Motivated by these emerging requirements to support advanced experiments in 
Biosciences”, enormous challenges are created for research in the area of “federated 
information management” [18], as well as for the creation of experimentation en- 
hancement infrastructures supporting: (1) the research scientists with the systematic 
definition and execution of their experimentations, through the “Virtual Laboratory” 
environment [4] [19], and (2) the collaboration / interoperation of independent and 
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autonomous sites requiring authorized and secure exchange of information and coor- 
dination of their distributed but joint activities, through the “Virtual Organization” 
[20] environment. 

A main target scientific domain studied by VIMCO, was the DNA-array experi- 
mentations from the bioscience domain. In this study, a large set of aspects/entities 
common to DNA-array scientific experiments were identified, and EEDM (Experi- 
mentation Environment Data Model), a base meta-data model for representation of 
experimental data [19], was designed. Later on the EEDM was further extended to 
become generalized representing the experimental data from other domains consid- 
ered in the VLAM-G, namely the Material Analyses of Complex Surfaces (MACS) 
from the Physics domain and the Electronic Fee Collection (EFC) from the Engineer- 
ing domain. The importance of such a base harmonized/integrated meta-data model, 
as described earlier in the paper, is not only for better support to complex experimen- 
tations in the area of biosciences, but the fact that it is invaluable in supporting inter- 
disciplinary research. For instance, it supports the representation of inter-related bio- 
logical, chemical, and physical properties of a certain biological element at once 

Based on the EEDM data model defined for data modeling of scientific experimen- 
tations, several databases are developed in VLAM-G. In specific, EXPRESSIVE 
database is developed for gene expressions, in the context of the DNA-array applica- 
tion [21]. EXPESSIVE aims at both (1) definition and management of information to 
support the storage and retrieval of the steps and annotations involved in DNA mi- 
cro-array experiments, for the purpose of investigation and reproducibility of experi- 
ments through the VLAM-G; and (2) storage and retrieval of the DNA micro-array 
experiment results (raw data and processed analysis results) through the VLAM-G, 
for the purpose of investigation, sharing, and scientific collaboration with other scien- 
tific centersThe VLAM-G EXPRESSIVE database model [21] complies with the 
MIAME specifications [11]; it is powerful enough to store any experimental informa- 
tion/annotations in the database, and thanks to the dynamic and flexible nature of the 
EEDM model, EXPRESSIVE is open for necessary future extensions. 

In VIMCO, in addition to the base common data manipulation mechanisms, sev- 
eral libraries supporting Web based and platform independent database access are 
provided. Furthermore, distributed and multi-threaded manipulation of data for multi- 
user access to Virtual Laboratory environment, and XML-based export, import and 
data/information exchange facilities for interoperation among federated databases are 
developed, that are used by all VLAM-G databases. 



3 A Biology Virtual Organization 

Most emerging applications of the future require an enhancement environment sup- 
porting proper collaboration and interoperation among different autonomous and 
heterogeneous organizations. While advances in ICT and networking provide the base 
technologies, innovative approaches in several areas including: safe communication, 
process coordination and workflow management, and federated information man- 
agement are necessary to provide the support infrastructure for the Virtual Organiza- 
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tions (VO) [20] paradigm. For the Bio-science applications, a Biology Virtual Or- 
ganization is defined as a temporary (or permanent) alliance of enterprises and centers 
in the biology sector, that come together to share skills, core competencies and re- 
sources in order to achieve common goals, and whose cooperation is supported by 
computer networks. The concept of VO complements and extends the functionality 
and potential of VLAM-G in the sense that VO reinforces the secure collaboration 
and coordinated interoperation of several individual autonomous organizations as a 
single entity, towards the achievement of VO’s common goals. Therefore, VO care- 
fully addresses the secure exchange of proprietary information, through the commu- 
nication network, among collaborating organizations, and tackles the coordination of 
distributed processes performed at different organizations. As such, several different 
VOs among different biology research groups may co-exist, while each has its own 
distinct goals, set of partners, and cooperation rules ensuring partners security. Fur- 
thermore, one organization may simultaneously be a partner in several VOs, while 
involved in different tasks and following different collaboration and information 
exchange rules in every VO. 

Fig. 3 illustrates the concept of Biology Virtual Organization (BVO) with an ex- 
ample, and shows how it is used together with VLAM-G. The BVO depicted in this 
figure is composed of three partners collaborating in order to develop a new pharma- 
ceutical drug: a pharmaceutical industry (Organization A), a biotechnology company 
which has an advanced DNA micro-array facility (Organization B) and a university 
based group providing a high-performance computing infrastructure (Organization 
C). All organizations have VLAM-G middleware installed. Organization A is in need 
of an advanced micro-array facility to produce the required arrays, and a high- 
performance computing facility to analyze the results, which are not locally available 
at Organization A. If the three organizations agree, then a BVO will be established, 
where organizations B and C will share their facilities with organization A, based on 
well-defined contracts. 

Among the main considerations for development of BVO support infrastructure as 
extensions to the VLAM-G, we can mention the following: (1) incorporation of in- 
formation modeling standards, (2) utilization of the existing VLAM-G with facilities 
such as reliability, robustness, security, and scalability, and (3) support for sharing 
and exchange of distributed information, maintaining the proper level of autonomy 
and security for each BVO partner, and provision of mechanisms for dynamic defini- 
tion of information visibility levels and access rights for other BVO partners. 



4 Conclusion 

In conclusion, new discoveries and emerging advanced research in Biosciences con- 
stitutes complex and relatively long experimentation processes that requires advanced 
federated information management and can benefit from the Virtual Laboratory and 
Virtual Organization environments. The Grid-based Virtual Laboratory VLAM-G and 
virtual organization BVO described in this paper provide the base infrastructure. 
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Fig. 3. Biology Virtual Organization (BVO) 



where the federated information management of VIMCO with its EEDM meta-data 
model and the EXPRESSIVE database can be helpful to solve the obstacles a scientist 
faces when realizing his experiment, and assist him towards proper extraction of 
knowledge from the vast amounts of heterogeneous data from autonomous sources. 
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Abstract. Reaching agreement in a distributed system is a fundamental 
issue of both theoretical and practical importance. Consensus, Atomic 
Commitment, Atomic Broadcast, Group Membership which are differ- 
ent versions of this paradigm underly much of existing fault-tolerant 
distributed systems. We describe these problems, explain their relation- 
ships, and state some fundamental results on their solvability, depending 
on the system model. We then review and compare basic techniques 
to circumvent impossibility results in asynchronous systems: randomiza- 
tion, models of partial synchrony, unreliable failure detection. 



1 Introduction 

The design and verification of fault-tolerant distributed applications is notori- 
ously quite challenging. In the last two decades, several paradigms have been 
identified for helping in this task. Key among these are consensus, atomic com- 
mitment, atomic broadcast, and group membership. In all these problems, pro- 
cesses ought to reach some form of agreement for achieving global consistency 
in the system. Roughly speaking, consensus and atomic commitment allow pro- 
cesses to reach a common decision, which depends on their initial values and/or 
on failures. Atomic broadcast is a convenient and efficient communication tool 
for delivering messages consistently, i.e., in the same order. A group membership 
manages the formation and the maintenance of a set of processes, called a group; 
each process has a local view of the group, and some form of process agreement is 
required to guarantee that local views are consistent. Since global consistency is 
what makes a collection of processes into a single system, distributed agreement 
algorithms are ubiquitous in distributed systems. 

From a theoretical point of view, agreement problems are intensively studied 
because they have simple rigorously formulations and are surprisingly challeng- 
ing. Impossibility results and lower bounds have been proved, demonstrating 
limitations on what problems can be solved, and with what costs. These results 
are fundamental in practice: they precisely establish the limits of what can be 
built, depending on the types of systems. Interestingly, they also provide a con- 
venient way for comparing the power of models that make different assumptions 
about time and failures. 

In this survey, our primary goal is to give precise specifications of the agree- 
ment problems mentioned above (namely, consensus, atomic commitment, atomic 
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broadcast, and group membership), and to go over the most basic results known 
for each of them. Remarkably, all these agreement problems are showed to be 
unsolvable in fault-prone asynchronous systems. However, they are so funda- 
mental that it is crucial to find ways around this limitation. We then review and 
compare the three classical approaches to circumvent these impossibility results, 
namely partial synchrony, randomization, and failure detection. 

Given the vastness of the field, and the space limitations, we cannot attempt 
an exhaustive presentation of the material: we do not address some agreement 
problems (for instance, approximate agreement, fc-agreement); we focus on the 
failures which are the most common ones in practice. Furthermore, many sig- 
nificant results are just mentioned in passing or not at all, and most proofs are 
omitted. Similarly, the bibliographic references are extensive, but incomplete. A 
comprehensive treatment can be found in the remarkable book by Lynch [37] . 

We rather strive to give a “global picture” of the field, and to put the main 
results in perspective. In particular, we point out that minor differences in as- 
sumptions about the model or in the problem specifications can result in quite 
important differences in the results on solvability and efficiency. This gives evi- 
dence that a rigorous treatment is needed in the area of fault-tolerant distributed 
computing to cope with the many subtle phenomena that arise. 

This paper is organized as follows. In Section 2, we describe the models of 
computation commonly used. We successively examine the consensus, atomic 
commitment, atomic broadcast, and group membership problems in Section 3. 
Various ways of circumventing the impossibility results stated in this section 
are discussed in Section 4. In particular, some new results concerning atomic 
commitment are presented. 

2 Models of Distributed Computing 

Problems in fault-tolerant distributed computing have been studied in a large 
variety of computational models. Such models are classified in two categories ac- 
cording to the communication medium: message-passing and shared-memory. In 
the former, processes communicate by exchanging messages over communication 
channels; in the latter, they interact with each other via shared objects, such 
as registers, queues, etc. In this paper, we focus on message-passing models. In 
addition to the communication medium, the main features of a model are its 
degree of synchrony and the types of failures that are assumed to occur. 

2.1 Degree of Synchrony 

Synchrony is an attribute of both processes and communication channels. A 
system is said to be synchronous if it satisfies the following two properties: 

1 . There is a known upper bound on message delay, that is on the time it takes 
for a message to be delivered. 

2. There is a known upper bound on the time that elapses between consecutive 
steps of a process. 




12 



Bernadette Charron-Bost 



A system is asynchronous if there is no such bounds. In other words, there is no 
timing assumptions in asynchronous systems. 

Distributed applications are hard to design in the asynchronous model since 
no timing information can be used by algorithms. However, this model is attrac- 
tive and has been extensively studied for several reasons: it has simple semantics, 
asynchronous algorithms are easier to port than those referring to some specific 
timing assumptions and are guaranteed to run correctly with arbitrary timing 
guarantees. 



2.2 Failure Model 

In an execution, a component (i.e., a process or a link) is faulty if its behavior 
deviates from the one prescribed by the algorithm it is running; otherwise, it is 
correct. A failure model specifies in what way a faulty component can deviate 
from its code. 



Process Failure. One mainly considers two types of process failures: 

1. Crash: a faulty process stops prematurely in the middle of its execution. 
Before stopping, it behaves correctly. 

2. Byzantine:^ a faulty process can exhibit any behavior whatsoever. For ex- 
ample, it can generate message or change state arbitrarily, without following 
its code. 

In both of these failure models, one usually need to assume limitations on 
the number of process failures. In some works on analysis of systems with pro- 
cess failures, these limitations often take the form of probability distributions 
governing the frequency of failures. Instead of using probability, the works on 
distributed algorithms simply assume that the number of failures is bounded in 
advance by a fixed number 



Link Failure. It is commonly supposed that link failure can result only in 
lost messages. Models in which incorrect messages may be delivered are seldom 
studied because in practice, the use of checksums allows the system to detect 
garbled messages and discard them. A link failure may be transient and so yield 
the destruction of some messages, or it may cause all messages sent over the link 
to be lost. This latter type of link failure may lead to network partition, that is 

^ The term Byzantine was first used for this type of failure in a landmark paper by 
Lamport, Pease, and Shostak [36], in which the consensus problem is formulated in 
terms of Byzantine generals. 

^ At first sight, such an assumption is realistic in practice, in the sense that it is 
“unlikely” that more than t failures occur. However, in most practical situations, 
if the number of failures is already large, it is likely that more failures will occur. 
Assuming a bound on the number of failures implies that failures are negatively 
correlated, whereas in practice, failures are independent or positively correlated. 
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the communication graph becomes disconnected. Then, some pairs of processes 
cannot communicate, making most problems unsolvable. 

In order to cope with link failures, there are basically two classical approaches. 
The first one consists in placing the responsibility of a message loss on the sender 
or the receiver. Link failures are so translated into process failures. This gives 
rise to a new type of process failures called omission failures, which are interme- 
diate between crash and byzantine failures. Unfortunately, it turns out that this 
approach has bad side effects: it puts some artificial limits on the number and 
the localization of lossy links (instead of just limiting the number of message 
losses) and results in an undesirable weakening of the problem requirements if 
the specification refers only to the behavior of correct processes (as for the non 
uniform agreement problems introduced in Section 3). 

The second approach uses data link protocols to mask message losses, i.e., to 
simulate reliable links. The problem of implementing reliable communications 
using unreliable links has been extensively studied. For example, the reader can 
refer to [7,48,1,8]. 

In the following, we consider process failures, but assume that links are reli- 
able. 

2.3 Formal Model 

Now, we briefly present formal models for asynchronous and synchronous sys- 
tems; notation and definitions are borrowed from [27,21] and [37]. 

We have tried to write this paper so that it can be mostly understood with 
minimal formal prerequisite. The reader not interested in the impossibility proofs 
of consensus and atomic commitment (presented in Section 3), can skip the 
following formal definitions, save the one of round in the synchronous model. 

We consider distributed systems consisting of a set of n processes 77 = 
{pi, ■ ■ ■ ,Pn}- Every pair of processes is connected by a reliable channel. 



Asynchronous Systems. Each process pi has a buffer, buffer^, that represents 
the set of messages that have been sent to Pi but that are not yet received. An 
algorithm A is a collection of n deterministic automata, one for each process. The 
automaton which runs on pi is denoted by A^. A configuration C of A consists 
of: 



— n process states statci{C), • • • , statCn{C) of Ai, • • • , A„, respectively; 

— n sets of messages bufferi{C), • • • , bufferj^(C), representing the messages pre- 
sently in buffer^,- ■ ■ , buffer^. 

Configuration C is an initial configuration if every state statCi{C) is an initial 
state of Ai and bufferi{C) is empty. Computations proceed in steps of A. In 
each step, a unique process pi atomically (1) receives a single message from the 
message buffer bufferi or a “null” message meaning that no message is received 
by during this step, (2) changes its state, and (3) may send messages to other 
processes, depending on its state at the beginning of the step and on the message 
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received in the step. The message received during the receive phase of a step of 
Pi is chosen nondeterministically among the messages in bufferi, and the null 
message. In particular, the null message may be received even if though bufferi 
is not empty. A step executed by process Pi is applicable to a configuration C if 
the message received in this step is present in bufferi{C). 

A schedule of A is a finite or infinite sequence of A’s steps. A schedule S is 
applicable to a configuration C if the steps of S are applicable in turn, starting 
from C. If S is finite, S{C) denotes the resulting configuration, which is said to 
be reachable from C. 

In the asynchronous model, we will only consider crash failures (cf. Section 3) . 
A process pi is correct in an infinite schedule S provided it takes infinitely many 
steps in S, and it is faulty otherwise. A run of algorithm A in the asynchronous 
model is a pair <Ci, S>, where (7/ is an initial configuration of A, S is an infinite 
schedule of A applicable to Ci, and every message sent to a correct process is 
eventually received in S. 



Synchronous Systems. In the case of synchronous systems, computations 
can be organized in synchronized rounds. This gives rise to a simple computa- 
tional model that is quite convenient to describe, prove, and assess distributed 
algorithms in the synchronous case. 

As for asynchronous systems, each process pi has a buffer denoted bufferi- 
An algorithm A of the synchronous model consists for each process pi G II 
in the following components: a set of states denoted by statesi, a nonempty 
subset initi of stateSi representing the possible initial states of Pi , a, message- 
generation function msgsi mapping each pair in stateSi x II to a unique (possibly 
null) message, and a state-transition function transi mapping statci and vectors 
(indexed by II) of message to stateSi. In any execution of A, each process pi 
repeatedly performs the following two stages: 

1. Apply msgSi to the current state to generate the messages to be sent to each 
process. Put these messages in the appropriate buffers. 

2. Apply tranSi to the current state and the messages present in bufferi to 
obtain the new state. Remove all messages from the bufferj^. 

The combination of these two actions is called a (synchronized) round of A.^ 

A process can exhibit a crash failure by stopping anywhere in the middle 
of its execution. In terms of the model, the process may fail before, after, or 
in the middle of performing Stage 1 or Stage 2. This means that the process 
may succeed in sending only a subset of the messages it is supposed to send, 
thus creating inconsistency in the system. A process can also exhibit byzantine 
failure by generating its next messages or next state in an arbitrary way, without 
following its code, i.e., the rules specified by its message-generation and state- 
transition functions. 

® A classical way for emulating synchronized rounds in a synchronous system uses a 
simple time-out mechanism. Time-out periods are then determined by the timing 
bounds available in the synchronous system. 
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A schedule of an algorithm is defined as a finite or infinite sequence of succes- 
sive rounds. As for the asynchronous model, we define the notions configuration, 
initial configuration, and run. In this model, the time complexity of an algorithm 
is measured in terms of the number of rounds until all the required outputs are 
produced. 

3 Agreement Problems in Distributed Systems 

3.1 Consensus 

The consensus problem is a simplified version of a problem that originally arose 
in the development of on-board aircraft systems. In this problem, some compo- 
nents in a redundant system ought to settle on a value, given slightly different 
readings from different sensors (eg., altimeters). Consensus algorithms are thus 
incorporated into the hardware of fault-tolerant systems to guarantee that a col- 
lection of processors carry out identical computations, agreeing on the results of 
some critical steps. This redundancy allows the processors to tolerate the failure 
of some processors. 

In the consensus problem, each process starts with an initial value from a 
fixed set V, and must eventually reach a common and irrevocable decision from 
V. More formally, the consensus problem is specified as follows: 

Agreement: No two correct processes decide differently. 

Validity: If all processes processes start with the same value v, then v is the 
only possible decision value. 

Termination: All correct processes eventually decide. 

An alternative validity condition is as follows: 

Strong validity: If a process decides v, then v is the initial value of some 
process. 

Obviously, this condition implies the validity condition that we have stated 
first, and these two conditions are equivalent for the binary consensus, that is 
when V = {0, 1}. 

The agreement condition of consensus may sound odd because it allows two 
processes to disagree even if one of them fails a very long time after deciding. 
Clearly, such disagreements are undesirable in many applications since they may 
lead the system to inconsistent states. This is why one introduces a strengthen- 
ing of the agreement condition, called the uniform agreement condition, which 
precludes any disagreement even due to faulty processes: 

Uniform agreement: No two processes (whether correct or not) decide differ- 
ently. 

The problem that results from substituting agreement for uniform agreement 
is called the uniform consensus problem. The uniform agreement condition is 
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clearly not achievable if processes may commit byzantine failures since this failure 
model imposes no limitation on the possible behaviors, and so on the possible 
decisions of faulty processes. Basically, this is the reason why consensus, which 
was first studied in the byzantine failure model, has been originally stated with 
non uniform conditions [42,36]. Afterwards, the problem specifications that have 
been studied were often non uniform specifications, even in the setting of benign 
failures: numerous results have been stated for consensus [27,21,23,22,25,13], and 
only a few are about uniform consensus [24,41,37]. 



Consensus in Synchronous Systems. We now give a brief survey of the 
main results on consensus in synchronous systems. 

Various algorithms have been devised, which tolerate crash failures. The most 
basic one is the FloodSet algorithm [37] : each process repeatedly broadcasts the 
set of values it has ever seen. If at most t processes may crash, the synchronized 
round model guarantees that all the alive processes have seen exactly the same 
values throughout the first t + 1 rounds. At the end of round t + 1, processes 
agree on the set of values they have seen, and so can use a common decision rule 
based on this set to decide safely. 

As mentioned above, uniform consensus is clearly not solvable in the byzan- 
tine failure model no matter the number of faulty processes is. On the other 
hand, consensus has been shown to be solvable if less than one third of processes 
are faulty [42,36]. The n > 3t restriction is not accidental: there is no consensus 
algorithm in a synchronous system with n processes which tolerates t byzantine 
failures, if 2 < n < 3t. Interestingly, the impossibility proof given in [36] is based 
on a reduction to the case of n = 3 and t = 1. 

Like FloodSet, the consensus algorithms in [42,36] which tolerate byzantine 
failures use t -I- 1 rounds. Indeed, these algorithms are optimal with respect to 
the number of rounds required for deciding: there is no consensus algorithm, 
for either type of failure, in which all the alive processes decide by the end 
of t rounds. This t + \ lower bound has been originally stated by Fischer and 
Lynch [28] in the case of byzantine failures. The result was then extended to 
the case of crash failures, first for uniform consensus by Dwork and Moses [25], 
and subsequently for consensus by Lynch [37]. With respect to this worst case 
time complexity, consensus and uniform consensus are therefore two equivalent 
problems in the crash failure model. A simpler proof of this t+ \ lower bound for 
crash failures based was presented by Aguilera and Toueg [2]. This latter proof 
combines a bivalency argument borrowed from [27] and a reduction to systems 
in which at most one process crashes in each round. 

We can refine this analysis by discriminating runs according to the number 
of failures that actually occur: we consider the number of rounds required to 
decide not over all the runs of an algorithm that tolerates t crash failures, but 
over all the runs of the algorithm in which at most / processes crash for any 
0 < / < t. Charron-Bost and Schiper [16] prove that uniform consensus requires 
at least f + 2 rounds whereas consensus requires only / -I- 1 rounds if / is less 
than t — 1. For f = t— 1 or f = t, the discrepancy between consensus and its 
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uniform version does not hold anymore: both consensus and uniform consensus 
require f +1 rounds. Dolev, Reischuck, and Strong [22] develop “early deciding’ 
consensus algorithms that achieves the general f +l lower bound for consensus: 
they design an early deciding algorithm such that in each run with at most / 
failures, all processes have decided by the end of round f + 1- Consequently, 
uniform consensus is harder than consensus in the context of the synchronous 
model with crash failures. For uniform consensus, Charron-Bost and Schiper [16] 
show that their lower bound is also tight. Note that early deciding algorithms are 
quite interesting in practice since they are much more efficient in the failure-free 
case - the most frequent case since failures are casual. 



Consensus in Asynchronous Systems. Although consensus can be solvable 
in synchronous systems in the presence of failures, either benign (crash) or severe 
(byzantine), this problem is unsolvable in an asynchronous system that is subject 
to even a single crash failure, as established by Fischer, Lynch, and Paterson in 
their seminal paper [27]. This impossibility result is clearly fundamental by itself, 
and also by the new concepts introduced for its proof. 

Essentially, the impossibility of consensus in the asynchronous case stems 
from the inherent difficulty for determining whether a process has actually 
crashed or is only very slow. The proof in [27] is based on this fundamental fea- 
ture of asynchronous systems, but the implementation of this intuitive argument 
in a rigorous manner is quite subtle. Fischer, Lynch, and Paterson introduced 
new ideas that subsequently, have been extensively used for proving other re- 
sults: the notion of the valency of a configuration and the round-robin process 
for constructing failure-free runs. 

More precisely, they consider the binary consensus problem, and for each con- 
figuration C, they define the set of decision values of the configurations reachable 
from C, denoted Val{C). If V al{C) = {0, 1}, then C is said to be bivalent] oth- 
erwise, C is univalent. 

Roughly speaking, their proof is structured as follows. For the sake of con- 
tradiction, they suppose that there exists a binary consensus algorithm A that 
tolerates one crash, and then prove the following assertions: 

1. Because of the asynchrony of the system, A has an initial bivalent configu- 
ration (Lemma 2). 

2. If a step s is applicable to a bivalent configuration C, then s can be delayed to 
yield a new bivalent configuration. More precisely, there is a finite schedules 
ending with the step s, applicable to C, and leading to a configuration that 
is still bivalent (Lemma 3). 

3. Using a round-robin argument and thanks to Lemma 3, one can construct 
a failure-free run starting from an initial bivalent configuration provided by 
Lemma 2, and such that all the configurations in this run are bivalent. In 
other words, processes remain forever indecisive in this run of A, and so A 
violates the termination condition of the consensus specification. 
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The delicate point of this proof is the construction of an “indecisive” run 
that is admissible, namely a run in which communications are reliable and at 
most one process crashes. 

Interestingly, the impossibility of consensus is quite easier to prove if a major- 
ity of processes may be faulty (n < 2t), using a standard “partioning” argument. 
We now recall this direct proof. 

Proof (of the impossibility of consensus ifn< 2t). The proof is by contradiction. 
Suppose algorithm A solves consensus in asynchronous systems with t> |"n/2] . 
Partition the processes into two sets ilo and Ui such that IIq contains |"n/2] 
processes, and iTi contains the remaining [n/2j processes. Consider the following 
two runs of A: 

— starts from the initial configuration Cq in which all the initial values are 

0. All processes in Uq are correct, while those in ili crash at the beginning 
of the run. 

— starts from the initial configuration Ci in which all the initial values are 

1. All processes in II i are correct, while those in IIq crash at the beginning 
of the run. 

By validity and termination, all correct processes decide 0 in p®, and 1 in p^. 
Let (7° (resp. <j^) be the shortest prefix of the schedule associated to p° (resp. 
p^) such that all the processes in IIq (resp. iTi) decides 0 (resp. 1) in cr°(C'o) 
(resp. cr^(Ci)). We now consider the initial configuration C where all the initial 
values of the processes in IIq are 0 and those of the processes in ili are 1. The 
concatenation a = cr°; cr^ is a possible schedule of A, applicable to C. By a simple 
round robin argument,^ we can extend cr to get a failure free run p starting from 
C which violates the agreement condition - a contradiction. 

The impossibility of consensus in asynchronous systems is extremely robust: 
it still holds when weakening many assumptions of the [27] model. In particular, 
the proof works even when considering non-deterministic processes, with any 
type of broadcast communications except atomic broadcast, or if receiving and 
sending are split into two separate steps. Moreover, Fisher, Lynch, and Paterson 
prove their impossibility result for a very weak validity condition, that we call 
the 0-t validity condition, and which only stipulates that 0 and 1 are two possible 
decision value. More formally, this condition is as follows: 

0-1 Validity: There exist two runs in which the decision values are 0 and 1, 
respectively. 



^ More precisely, the run p is constructed step by step, starting from o-(C): processes 
of n are maintained in a queue in an arbitrary order. Each step is executed by the 
first process in the queue. In every step taken by process pi, this process receives the 
earliest sent message in buffer^ or the null message if buffer^ is empty. Every process 
takes infinitely many steps in p and receives every message sent to it. Therefore, p 
is a failure-free run. 
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3.2 Atomic Commitment 

In a distributed database system, ensuring that transactions terminate consis- 
tently is a critical task: the sites whose databases where updated by the trans- 
action must agree on whether to commit it (i.e., its results will take effect at all 
the sites) or abort it (i.e., its results will be discarded). 

More specifically, we consider a collection of processes which participate in the 
processing of a database transaction. After this processing, each process arrives 
at an initial “opinion” about whether the transaction ought to be committed 
or aborted, and votes Yes or No, accordingly. A process votes for committing 
the transaction if its local computation on behalf of that transaction has been 
successfully completed, and otherwise will vote for aborting the transaction. 
The processes are supposed to eventually output a decision, by setting an out- 
put variable decision to Commit or Abort. For this problem, the correctness 
conditions are 

Agreement: No two processes decide on different values. 

Validity: 

1. If any process initially votes No, then Abort is the only possible deci- 
sion. 

2. If all processes vote Yes and there is no failure, then Commit is the 
only possible decision. 

with a termination condition which comes into two flavors 

Weak termination: If there is no failure, then all processes eventually decide. 
Non-blocking termination: All correct processes eventually decide. 

The problems specified by these conditions are called atomic commitment 
(AC) for the weak termination and non-blocking atomic commitment (NB-AC) 
for the non-blocking termination. Generally, both AC and NB-AC are studied 
in the context of the failure model where links are reliable and only processes 
may fail by crashing. 

Clearly, the NB-AC and uniform consensus problems are very similar: when 
identifying Yes and Commit with 1, and No and Abort with 0, they only 
differ in the validity condition. At this point, it is relevant to consider the weak 
validity condition introduced by Hadzilacos in [31]: 

Weak validity: If there is no failure, then any decision value is the initial value 
of some process. 

Obviously, the weak validity condition is weaker than the validity conditions 
of both consensus and atomic commitment, but stronger than the 0-1 validity 
condition. Therefore, the impossibility result of [27] immediately implies that 
NB-AC cannot be solved in asynchronous systems with crash failures, even if 
only one process may fail. Actually, the impossibility result of NB-AC can be 
showed by a simple argument which does not require the subtle Lemma 3 of [27] . 
We now give this direct proof. 
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Proof (of impossibility of NB-AC). Suppose, to obtain a contradiction, that there 
is an algorithm A which solves NB-AC and tolerates one failure. We first state 
a general technical lemma for the asynchronous model.® For that, we define the 
failure-free valency of an initial configuration C as the set of decisions which are 
reachable from C in failure-free runs. This set is denoted by Val^{C). 

Lemma 1. For any initial configuration C , valency and failure-free valency of 
C are equal: 

Vaf{C) = Val{C). 

Proof. Obviously, Val^{C) C Val{C). Conversely, let d G Val{C), and p be a 
run of A starting from C in which the decision value is d. By the termination 
condition, there is a finite prefix a oi p such that some process has decided d in 
the configuration cr(C'). By a round-robin argument, we construct an extension 
p® of a that is a failure-free run of A. Since decisions are irrevocable and p® 
extends cr, the decision value in p° is d, and so d G Vaf{C). 



Lemma 2. The initial configuration where all processes vote Yes is bivalent. 

Proof. Let Ci be the initial configuration where all processes vote Yes. By the 
second part of the validity condition. Commit G Val{Ci). 

Consider a run p of A starting from C\ in which only one process, say p, 
is faulty and crashes from the beginning. The infinite schedule of events corre- 
sponding to p is applicable to the initial configuration where all processes vote 
Yes except process p which votes No. We so construct a run p' of A which is in- 
distinguishable from p to any process different than p. In particular, any process 
different than p decides the same value in p and p' . By the first part of the valid- 
ity condition, the decision value in p' is Abort, and so processes decide Abort 
in p. This shows that Abort G Val(Ci), so Val{Ci) = {Abort, Commit}, as 
needed. 

Combining Lemmas 1 and 2 immediately yields a contradiction with the 
second part of the validity condition. 

Though the impossibility of NB-AC is much easier to prove than the one 
of uniform consensus, this latter problem cannot be reduced to NB-AC in gen- 
eral. Charron-Bost and Toueg [18] actually show that (1) uniform consensus is 
not reducible to to NB-AC except if t = 1; (2) conversely, NB-AC is never re- 
ducible to uniform consensus. In other words, uniform consensus and NB-AC 
are two agreement problems with quite similar specifications, but which are not 
comparable. 

Contrary to its non-blocking version, AC is attainable in asynchronous sys- 
tems with crash failures. The simplest and best-known AC algorithm is the two 

® Actually, this lemma holds in any computational model that displays no failure [40], 
i.e., snch that any finite schedule applicable to some configuration C has an infinite 
extension which is a failure-free run. Note that this lemma has been already stated 
in [21] to prove an impossibility result for consensus. 
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phase commit (2PC) algorithm [29]. This algorithm and various variations of it 
are discussed in [10]. Unfortunately, the 2PC algorithm may cause blocking even 
when running in synchronous systems. Skeen [47] devised the three phase commit 
(3PC) algorithm - an embellishment of the 2PC algorithm - which guarantees 
non-blocking termination in synchronous systems. In the event of timing fail- 
ures, the 3PC algorithm may lead to inconsistent decisions, which is probably 
the main reason why it is not used in practice. 

The 3PC algorithm requires 3n rounds. This is much higher than the t + 1 
lower bound for consensus and its uniform version in synchronous systems. As 
mentioned above, uniform consensus and NB-AC are not comparable, and so the 
validity condition of NB-AC may yield a different lower bound on the number of 
rounds required for deciding. Actually, we can observe that the proof of the t -I- 1 
lower bound for consensus presented by Aguilera and Toueg [2] still works when 
considering the weak validity condition® - a weaker proviso than the validity 
condition of AC. Therefore, the t -I- 1 lower bound also holds for the NB-AC 
problem. Moreover, the FloodSet algorithm can be easily modified to design a 
version that achieves NB-AC in t -|- 1 rounds. 

In the 3PC algorithm, 3t rounds may be required to decide. So why is 3PC an 
interesting algorithm? The main reason is that in the failure-free case - the most 
frequent case - , 3PC requires only 3 rounds to decide. But it is not clear whether 
3 rounds are necessary to decide in failure-free runs. More generally, it would be 
worthy to determine a lower bound for early deciding NB-AC algorithms, that 
is the number of rounds required to decide in a run of a NB-AC algorithm with 
at most / (0 < / < t) failures. 



3.3 Atomic Broadcast 

One way to implement a fault-tolerant service is by using multiple servers that 
may fail independently. The state of the service is replicated at these servers, 
and updates are coordinated so that even when a subset of servers fail, the 
service remains available. One approach for replication management is the so- 
called “state-machine approach” or “active replication” , which has no centralized 
control. In this approach, replica coordination and consistency are ensured by 
enforcing all replicas to receive and process the same sequence of requests. This 
condition can be achieved by a broadcast primitive, called atomic broadcast, 
which guarantees that all correct processes deliver the same messages in the 
same order. 

Formally, atomic broadcast is a broadcast which satisfies the following four 
properties: 

Validity: If a correct process p broadcasts a message m, then p eventually 
delivers m. 

® On the other hand, their proof does not work with the 0-1 validity condition. Indeed, 
Dwork and Moses [25] devised an algorithm which guarantees agreement, termina- 
tion, and 0-1 validity in two rounds. 
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Agreement: If a correct process delivers a message m, then all correct processes 

eventually deliver m. 

Integrity: For any message m, every process delivers m at most once, and only 

if was previously broadcast. 

Total order: If correct processes p and q both deliver messages m and m' , then 

p delivers m before m' if and only if q delivers m before m' . 

Atomic broadcast is a problem that involves the achievement of some sort 
of agreement - namely, agreement on the sequence of messages delivered by 
correct processes - in a fault-tolerant manner, and so has a common flavor with 
the consensus problem. Indeed, consensus and atomic broadcast are equivalent in 
asynchronous systems with crash failures. First, consensus can be easily reduced 
to atomic broadcast as follows [21]: To propose a value, a process atomically 
broadcasts it. Every process then decides on the value of the first message that it 
delivers. Note that this reduction makes no assumption on the system synchrony 
or topology, and tolerates any number of crash failures. Using the impossibility 
result for consensus [27], this reduction shows that atomic broadcast cannot 
be solved in asynchronous systems, even if we assume that at most one process 
crash. 

Conversely, Chandra and Toueg [13] show how to transform any consensus al- 
gorithm into an atomic broadcast algorithm. Their transformation uses repeated 
executions of consensus. Messages that must be delivered are partitioned into 
batches, and the fc-th execution of consensus is used to decide on the fc-th batch 
of messages to be atomically delivered. A precise description of the reduction is 
given in [30,13]. This reduction shows that atomic broadcast can be solved us- 
ing randomization, partially synchronous models, or some failure detectors since 
consensus is solvable in such models (cf. Section 4). Note that the reduction of 
atomic broadcast into consensus also applies to any asynchronous system and 
tolerates any number of crash failures. However, it requires that reliable broad- 
cast - the weaker broadcast which only guarantees that correct processes deliver 
the same set of messages - be solvable in the system, and so requires infinite 
storage [15]. 

3.4 Group Membership 

The problem of group membership has been the focus of much theoretical and 
experimental work on fault-tolerant distributed systems. A group membership 
protocol manages the formation and maintenance of a set of processes called 
a group. For example, a group may be a set of processes that are cooperating 
towards a common task (e.g., the primary and backup servers of a database), a 
set of processes that share a common interest (e.g., clients that subscribe to a 
particular newsgroup), or the set of all processes in the system that are currently 
deemed to be operational. In general, a process may leave a group because it 
failed, it voluntarily requested to leave, or it is forcibly expelled by other mem- 
bers of the group. Similarly, a process may join a group; for example, it may have 
been selected to replace a process that has recently left the group. A group mem- 
bership protocol must manage such dynamic changes in some coherent way: each 
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process has a local view of the current membership of the group, and processes 
maintain some form of agreement on these local views. 

Two types of group membership services have emerged: primary-partition, 
e.g. [45,34,39,38,32], and partitionable, e.g. [3,33,49,5,26]. Roughly speaking, a 
primary-partition group membership service maintains a single agreed view of 
the group (i.e., processes agree on their local views of the group). Such services 
are intended for systems with no network partitions, or for systems that allow 
the group membership to change in at most one network partition, the primary 
partition. In contrast, a partitionable group membership service allows multiple 
views of the group to co-exist and evolve concurrently: there may be several 
disjoint subsets of processes such that processes in each subset agree that they 
are the current members of the group. In other words, such group membership 
services allow group splitting (e.g., when the network partitions) and group 
merging (e.g., when communication between partitions is restored). 

The group membership problem was first defined for synchronous systems 
by [20]. Since then, the group membership problem for asynchronous systems 
has also been the subject of intense investigation. Yet, despite the wide interest 
that it has attracted and the numerous publications on this subject, the group 
membership problem for asynchronous systems is far from being understood. In 
particular, there is no commonly agreed definition for this problem, and some of 
the most referenced formal definitions are unsatisfactory [4] . 

Despite their differences, all versions of the group membership problem re- 
quire some form of process agreement in systems with failures. The potential 
for running into an impossibility result as for consensus or atomic commitment 
is therefore obvious. On the other hand, group membership is different from 
consensus in at least two ways: 

— In group membership, a process that is suspected to have crashed can be 
removed from the group, or even killed, even if this suspicion is actually 
incorrect (e.g., the suspected process was only very slow). The [27] model 
does not speak about process removals, and it does not directly model process 
killing (i.e., program-controlled crashes). 

— Up to this point, all the specifications of the agreement problems have a 
termination condition, i.e., require progress in all runs. Group membership 
does allow runs that “do nothing” (for instance, “doing nothing” is desirable 
when no process wishes to join or leave the group, and no process crashes). 

These differences have been widely cited as reasons why group membership is 
solvable in asynchronous systems while other classical agreement problems are 
not. 

For primary-partition group membership services, Chandra, Charron-Bost, 
Hadzilacos, and Toueg [14] actually prove that these reasons are not sufficient to 
make group membership solvable: they define a problem called WGM (for Weak 
Group Membership) that allows the removal of erroneously suspected processes 
from the group, and is subsumed by any reasonable definition of group mem- 
bership, and show that WGM cannot be solved in asynchronous systems with 
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failures, even in systems that allow program-controlled process crashes. When 
looking closer at the implementations of primary-partition group membership, it 
turns out that they do not satisfy the very weak liveness requirement of WGM. 
Indeed, they have runs that “block” forever, or remove or kill all processes.^ 

In contrast to primary-partition group membership services, partitionable 
ones allow processes to disagree on the current membership of the group, i.e., 
several different views of the membership of the group may evolve concurrently 
and independently from each other. In particular, there may be several disjoint 
subsets of processes such that processes in each subset agree that they are the 
current members of the group. In other words, such group membership services 
allow group splitting (e.g., when the network partitions) and group merging (e.g., 
when communication between partitions is restored) . 

By allowing disagreement, such group membership services escape from the 
impossibility result of [14]. However, they run into another fundamental problem: 
their specification must be strong enough to rule out useless group membership 
protocols (in particular, protocols that can capriciously split groups into single- 
ton sets) and yet it should be weak enough to remain solvable. The design of 
such a specification is still an open problem. 

4 Circumventing Impossibility Results 

As explained in Section 3, agreement problems are not solvable in asynchronous 
systems, even for a single crash failure. However the agreement problems are so 
fundamental in distributed computing that it is quite important to find ways 
to cope with this limitation. In order to make agreement problems solvable, 
we can strengthen the model, weaken the correctness requirements, or both. 
Three approaches have been thus investigated to circumvent the impossibility 
of agreement problems: partial synchrony, randomization, and unreliable failure 
detection. 

4.1 Partial Synchrony 

This approach is based on the observation that between the synchronous model 
and the asynchronous model there lie a variety of intermediate models that are 
called partially synchronous. In a partially synchronous system, processes have 
informations about time, although this information might be partial or inexact. 
For example, processes in a partially synchronous system might have access to 
synchronized clocks, or might know bounds on message delivery time or relative 
processes speeds. 

Dolev, Dwork, and Stockmeyer [21] consider the following five synchrony 
parameters: 

1. synchronous processes: processes know a bound on process step time; 

^ For example, the implementation of S-GMP [45] can crash all processes in the system 
before any new view is installed. 
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2. synchronous communication: processes know a bound on message delivery 
time; 

3. synchronous message order: messages are delivered in order of sending; 

4. broadcast transmission: in an atomic step, a process can broadcast messages 
to all processes; 

5. atomic receive/send: receiving and sending are part of the same atomic step. 

The third parameter is non classical: actually, synchronous message order is 
similar to the causal order condition [35,46] in which the causality relation is re- 
placed by real-time order. Varying these five parameters yields 2^(= 32) partially 
synchronous models. Within the space of these models, [21] precisely delineates 
the boundary between solvability and unsolvability of consensus: they identify 
four “minimal” cases in which a consensus algorithm that tolerates n—1 crashes 
exists, but the weakening of any synchrony parameter would yield a model of 
partial synchrony where consensus is unsolvable. These four minimal cases are: 

— synchronous processes and synchronous communication (corresponding to 
the completely synchronous model); 

— synchronous processes and synchronous message order; 

— broadcast transmission and synchronous message order; 

— synchronous communication, broadcast transmission, and atomic receive 
/send. 

In particular, contrary to synchronous processes, synchronous communication by 
its own makes consensus solvable in the [27] model (i.e., broadcast transmission 
and atomic receive/send). In any weaker model with point-to-point communi- 
cation or separate receive and send, synchronous processes and synchronous 
communication are both required for the consensus problem to be solvable in 
the presence of two failures. 

Two other partially synchronous models are considered in [23]. The first 
model assumes that there are bounds on process step time and on message de- 
livery time, but these bounds are not known. The second model assumes that 
these bounds are known (system is synchronous) but they hold only after some 
unknown time called global stabilization time. In both of these partially syn- 
chronous models, consensus is shown to be solvable if the ratio of faulty processes 
is less than 1/2 for crash failures, and 1/3 for byzantine failures. 

The algorithms devised in [23] use the rotating coordinator paradigm and 
proceed in “non-synchronized rounds”. The coordinator of each round tries to 
get other processes to change to some value v that it thinks “acceptable”; it 
decides v if it receives sufficiently many acknowledgments from the processes 
that have changed their value to v, so that any value different from v will never 
be found acceptable in subsequent rounds. This scheme already appears in [44] 
and has been used many times to design algorithms that may not terminate but 
never permits disagreement when the system malfunctions (i.e., when synchrony 
assumptions are not met). 
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4.2 Randomization 

As explained in Section 3.1, there are two fundamental limitations for the con- 
sensus problem: the t -I- 1 round lower bound in synchronous systems and the 
impossibility result of [27] for completely asynchronous systems. Contrary to 
the partially synchrony approach, randomization helps us to cope with both 
limitations. 

In this approach, we both strengthen the model and weaken the correctness 
requirements of consensus. On one hand, we augment the system model by in- 
cluding randomization: each process has access to an oracle (usually called a coin) 
that provides random bits, and so processes can make random choices during the 
computation. On the other hand, the correctness conditions are slightly weaker 
than previously: validity and agreement are still required but termination must 
be guaranteed only with probability 1. The precise meaning of this probabilistic 
termination involves the notion of adversary. A randomized distributed algo- 
rithm has a high degree of uncertainty, and so a large set of possible executions 
because of both the nondeterministic choices - that is, which process takes the 
next step and which message is received - and the probabilistic choices. The non- 
deterministic choices must be resolved in order to obtain a purely probabilistic 
system. It is convenient to imagine that the nondeterministic choices are under 
the control of an adversary.® Given a randomized distributed algorithm and the 
strategy employed by an adversary A, the probabilities of random choices induce 
a well-defined probability distribution on the set £j\^ of executions that are under 
the control of A. The probabilistic termination condition means that for every 
adversary A, termination is satisfied with probability 1 in 

The first two randomized consensus algorithms appeared in 1983, and were 
devised by Ben-Or [9] and Rabin [43] respectively. The two algorithms are sim- 
ilar in flavor, but the principal difference between the two algorithms is in the 
random oracles that are used. In Rabin’s algorithm, the coin is shared by pro- 
cesses which thus see the same random bit. Implicitly, this induces some kind of 
agreement among processes. Ben-Or was the first to give a truly distributed algo- 
rithm where processes flip private coins. In both Rabin’s and Ben-Or’s work, the 
original goal was to achieve consensus in a completely asynchronous system, but 
the resulting algorithms can be run in the synchronous environment, often with 
high resiliency. Moreover, the expected running time of the synchronous versions 
of these algorithms are considerably less than the lower bound of t -I- 1 rounds 
for nonrandomized algorithms.® Later on, other randomized algorithms for con- 
sensus have been proposed; they differ in various respects: type of adversaries to 
which they are tolerant, resiliency, expected running time, communication costs 
(for a very complete survey of randomized consensus algorithms see [19]). 

® Various types of adversaries can be considered, which are more or less powerful (cf. 
[19]): adversaries may have limitations on their computing power and on the informa- 
tion that they can obtain from the system. In particular, a malicious adversary has 
knowledge of the past execution, including information about past random choices. 
® More precisely, this holds for Rabin’s algorithm, and for Ben-Or’s algorithm with 
sufficiently small t. 
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These results about consensus show that contrary to the classical (central- 
ized) algorithms , the randomized distributed model is computationally more 
powerful. However, randomization does not completely eliminate the lower 
bounds of consensus (the t -I- 1 lower bound in synchronous systems and the 
impossibility result for the asynchronous model), but it puts the limitations far- 
ther. Actually, by a refinement of the partionning argument of Section 3.1 Bracha 
and Toueg [11] show that there is no randomized consensus algorithm if more 
than half of the processes may be faulty (n < 2t). Furthermore, Bar- Joseph and 
Ben-Or [6] prove a tight lower bound of 6>(t/\/nlog(2 -|- tj \fn) on the expected 
number of rounds needed for randomized consensus algorithms for the crash 
failure model. 

Until now, we have focused on consensus. But what about the other classical 
agreement problems? Since atomic broadcast can be reduced to consensus in 
asynchronous systems with crash failures [13], it can be solved using random- 
ization in such environments. To the best of my knowledge, the randomization 
approach has not been investigated yet for the atomic commitment and group 
membership problems. 



4.3 Unreliable Failure Detectors 

An alternative approach to solve agreement problems in fault-prone asynchron- 
ous systems is to strengthen the model by adding a new type of component called 
a failure detector. Since impossibility results for asynchronous systems stem from 
the inherent difficulty to determine whether a process has actually crashed or is 
only very slow, the idea is to augment the asynchronous computational model 
with a failure detection mechanism that can possibly make errors. 

More specifically, a failure detector is a distributed oracle that gives some 
(possibly incorrect) hints about which processes may have crashed so far.^° Each 
process has access to a failure detector module that it consults in each step. The 
notion of failure detector is defined and developed by Chandra and Toueg [13] 
and by Chandra, Hadzilacos, and Toueg [12]. The precise definitions of a failure 
detector T> and the computational model of an asynchronous system equipped 
with T> are given in both papers. 

Given two failure detectors V and V , Chandra and Toueg [13] formally define 
what it means for V to provide at least as much information as V does. To do 
so, they introduce the notion of reducihility among failure detectors. Roughly 
speaking, a failure detector D' is reducible to T> if there is a distributed algorithm 
that can transform T> into T>' . We also say that T>' is weaker than T>: any problem 
that can be solved using T>' can also be solved using T> instead. We note T>' ^ T>. 
The reducibility relation ^ is transitive, and so define a preorder. Two failure 
detectors are equivalent if they are reducible to each other. 

As an example, a failure detector module can maintain a list of processes that it 
currently considers to be correct. Such an instance of failure detectors is very close 
to a group membership service (cf. Section 3.4). 

Note that the reducibility relation ^ is not antisymmetric, in general. 
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Chandra and Toueg [13] and Chandra, Hadzilacos, and Toueg [12] focus on 
the consensus problem. They define a failure detector denoted C such that the 
output of the failure detector module of Q at process p is a single process. 
Intuitively, when q is the output of 17 at p the failure detector module of 17 at 
process p currently considers q to be correct; then, we say that p trusts q. The 
17 failure detector satisfies the following property: 

There is a time after which all the correct processes always trust the 

same correct process. 

In Chandra and Toueg [13], it is shown that consensus can be solved in an 
asynchronous system equipped with 17 if a majority of processes are correct 
(n > 2t). As for the consensus algorithm devised by Dwork, Lynch, and Stock- 
meyer [23] for the eventual synchronous model - the partially synchronous model 
which is synchronous after some unknown time -, their algorithm proceeds in 
asynchronous rounds and uses the rotating coordinator paradigm. Conversely, 
Chandra, Hadzilacos, and Toueg [12] show that if a failure detector T> can be 
used to solve consensus, then 17 is reducible to T>, i.e., 17 ^ 27. Thus 17 is the 
weakest failure detector for solving consensus in asynchronous systems with a 
majority of correct processes. On the other hand, Chandra and Toueg [13] show 
that the consensus problem cannot be solved using 17 if n < 21. 

Note that the above positive result - namely, consensus is solvable using 17 
- is less surprising than it may seem at first sight: indeed, the 17 failure detector 
ensures that processes eventually agree on the name of a correct process, and 
consensus is easily solvable if processes trust the same correct process. From this 
standpoint, the asynchronous model augmented with 17 appears as the natural 
counterpart of the eventual synchronous model in the failure detector approach. 

From the equivalence between atomic broadcast and consensus, we deduce 
that the same positive and negative results hold for atomic broadcast. But what 
about the atomic commitment problem? In a recent paper [17], I address this 
question and point out the impact of the validity condition on the solvability 
issues of agreement problems in the asynchronous environment. This paper in- 
troduces the binary flag failure detector, denoted BiF, the output of which is the 
flag NF (for No Failure) or F (for Failures). This failure detector satisfies the 
following three properties: 

1. If all processes are correct, then no flag is ever F. 

2. If the flag of some process is F, then it remains F forever. 

3. If some process crashes, then there is a time after which all the flags are F. 

By comparing the consensus and atomic commitment problems in asyn- 
chronous systems equipped with the failure detector, I show that atomic 
commitment can be solved using BiF if t = 1.^^ Conversely, if a failure detector 

In fact, Chandra and Toueg [13] solve consensus using another failure detector, 
denoted <>W. The eventual agreement ensured by 17 is far less evident in <>W. As a 
result of their main theorem, Chandra, Hadzilacos, and Toueg [12] show that 17 and 
<>W are actually equivalent. 

Note that if at most one process is faulty {t — 1), then 17 is easily reducible to BF. 
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T> can be used to solve atomic commitment, then BT is reducible to T>, i.e., 
V < ST. Thus, BT is indeed the weakest failure detector for solving atomic 
commitment in asynchronous systems with t = 1. 

Note that this latter impossibility result for atomic commitment is far less 
technical and difficult to prove than the analogous result stated by Chandra, 
Hadzilacos, and Toueg [12] for consensus. Basically, the proof relies on one simple 
idea: if all processes initially vote Yes, then any atomic commitment algorithm 
decides Commit if and only if there is no failure, and so can be used to detect 
whether failures occur. 

Finally, I prove in [17] that the atomic commitment problem cannot be solved 
using BT if t > 1. It thus turns out that the t = nj2 boundary of consensus 
is replaced by the t = 1 boundary for atomic commitment. Intuitively, this can 
be explained by the form of the validity conditions: validity of consensus is a 
symmetric condition which does not refer to the failure pattern; on the other 
hand, the validity condition of atomic commitment enforces some decision values 
according to the fact that some failures occur or not. 
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Abstract. In this paper, we present tbe results of our work that seeks to 
negotiate tbe gap between low-level features and bigb-level concepts in tbe 
domain of web document retrieval. This work concerns a technique, latent 
semantic indexing (LSI), which has been used for textual information retrieval 
for many years. In this environment, LSI determines clusters of co-occurring 
keywords, sometimes, called concepts, so that a query which uses a particular 
keyword can then retrieve documents perhaps not containing this keyword, but 
containing other ke5rwords from the same cluster. In this paper, we examine the 
use of this technique for content-based web document retrieval, using both 
keywords and image features to represent the documents. 



1 Introduction 

The emergence of multimedia technology and the rapidly expanding image and video 
collections on the internet have attracted significant research efforts in providing tools 
for effective retrieval and management of visual data. Image retrieval is based on the 
availability of a representation scheme of image content. Image content descriptors 
may be visual features such as color, texture, shape, and spatial relationships, or 
semantic primitives. 

Conventional information retrieval was based solely on text, and those approaches 
to textual information retrieval have been transplanted into image retrieval in a 
variety of ways. However, “a picture is worth a thousand words”. Image contents are 
much more versatile compared with texts, and the amount of visual data is already 
enormous and still expanding very rapidly. Hoping to cope with these special 
characteristics of visual data, content-based image retrieval methods have been 
introduced. It has been widely recognized that the family of image retrieval 
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techniques should become an integration of both low-level visual features addressing 
the more detailed perceptual aspects and high-level semantic features underlying the 
more general conceptual aspects of visual data. Neither of these two types of features 
is sufficient to retrieve or manage visual data in an effective or efficient way [1]. 
Although efforts have been devoted to combining these two aspects of visual data, the 
gap between them is still a huge barrier in front of researchers. Intuitive and heuristic 
approaches do not provide us with satisfactory performance. Therefore, there is an 
urgent need of finding the latent correlation between low-level features and high-level 
concepts and merging them from a different perspective. How to find this new 
perspective and bridge the gap between visual features and semantic features has been 
a major challenge in this research field. 



1.1 Image Retrieval 

Image retrieval is an extension to traditional information retrieval. Its purpose is to 
retrieve images, from a data source (usually a database or the entire internet), that are 
relevant to a piece of query data. Approaches to image retrieval are somehow derived 
from conventional information retrieval and are designed to manage the more 
versatile and enormous amount of visual data which exists. 

The different types of information items that are normally associated with images 
are as follows: 

• Content-independent metadata: data that is not directly concerned with image 
content, but related to it. Examples are image format, author’s name, date and 
location. 

• Content metadata: 

— Content-dependent metadata: data referring to low-level or intermediate-level 
features, such as color, texture, shape, spatial relationships, and their various 
combinations. 

- Content-descriptive metadata: data referring to content semantics, concerned 
with relationships of image entities to real-world entities. 

Low-level visual features such as color, texture, shape and spatial relationships are 
directly related to perceptual aspects of image content. Since it is usually easy to 
extract and represent these features and fairly convenient to design similarity 
measures by using the statistical properties of these features, a variety of content- 
based image retrieval techniques have been proposed in the past few years. High- 
level concepts, however, are not extracted directly from visual contents, but they 
represent the relatively more important meanings of objects and scenes in the images 
that are perceived by human beings. These conceptual aspects are more closely 
related to users’ preferences and subjectivity. Concepts may vary significantly in 
different circumstances. Subtle changes in the semantics may lead to dramatic 
conceptual differences. Needless to say, it is a very challenging task to extract and 
manage meaningful semantics and to make use of them to achieve more intelligent 
and user-friendly retrieval. The next section analyzes these challenges in more detail. 
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1.2 Challenges 

High-level conceptual information is normally represented by using text descriptors. 
Traditional indexing for image retrieval is text-based. In certain content-based 
retrieval techniques, text descriptors are also used to model perceptual aspects. 
However, the inadequacy of text description is very obvious: 

• It is difficult for text to capture the perceptual saliency of visual features. 

• It is rather difficult to characterize certain entities, attributes, roles or events by 
means of text only. 

• Text is not well suited for modeling the correlation between perceptual and 
conceptual features. 

• Text descriptions reflect the subjectivity of the annotator and the annotation 
process is prone to be inconsistent, incomplete, ambiguous, and very difficult to be 
automated. 

Although it is an obvious fact that image contents are much more complicated than 
textual data stored in traditional databases, there is an even greater demand for 
retrieval and management tools for visual data, since visual information is a more 
capable medium of conveying ideas and is more closely related to human perception 
of the real world. Image retrieval techniques should provide support for user queries 
in an effective and efficient way, just as conventional information retrieval does for 
textual retrieval [2]. In general, image retrieval can be categorized into the following 
two types: 

• Exact Matching - This category is applicable only to static environments or 
environments in which features of the image do not evolve over an extended 
period of time. Databases containing industrial and architectural drawings, or 
electronics schematics are examples of such environments. 

• Similarity-Based Searching - In most cases, it is not quite obvious to know which 
images best satisfy the query. Different users may have different ideas. Even the 
same user may have different preferences under different circumstances. Thus, it is 
desirable to return the top several similar images based on the similarity measure, 
so as to give users a good sampling. User interaction plays an important role in this 
type of retrieval. Databases containing natural scenes or human faces are examples 
of such environments. 

For either type of retrieval, the dynamic and versatile characteristics of image 
content require expensive computations and sophisticated methodologies in the areas 
of computer vision, image processing, data visualization, indexing, and similarity 
measurement. In order to manage image data effectively and efficiently, many 
schemes for data modeling and image representation have been proposed. Typically, 
each of these schemes builds a symbolic image for each given physical image to 
provide logical and physical data independence. Symbolic images are then used in 
conjunction with various index structures as proxies for image comparisons to reduce 
the searching scope. The high-dimensional visual data is usually reduced into a 
lower-dimensional subspace so that it is easier to index and manage the visual 
contents. Once the similarity measure has been determined, indexes of corresponding 
images are located in the image space and those images are retrieved from the 
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database. Due to the lack of any unified framework for image representation and 
retrieval, certain methods may perform better than others under certain query 
situations. Therefore, these schemes and retrieval techniques have to be somehow 
integrated and adjusted on the fly to facilitate effective and efficient image data 
management. 



1.3 Research Goals 

The work presented in this chapter aims to improve several aspects of content-based 
image retrieval by finding the latent correlation between low-level visual features and 
high-level semantics and integrating them into a unified vector space model. To be 
more specific, the significance of this approach is to design and implement an 
effective and efficient framework of image retrieval techniques, using a variety of 
visual features such as color, texture, shape and spatial relationships. Latent semantic 
indexing, an information retrieval technique, is incorporated with content-based 
image retrieval. By using this technique, we hope to extract the underlying semantic 
structure of image content and hence to bridge the gap between low-level visual 
features and high-level conceptual information. Improved retrieval performance and 
more efficient indexing structure can also be achieved. We have investigated the 
following issues in our preliminary research and the experimental results are very 
promising. Our goals are as follows: 

• We aim to present a novel approach based on latent semantic indexing to image 
retrieval and explore how it helps to reveal the latent correlation between feature 
sets and semantic clusters. 

• We aim to experiment with a special feature extraction method, namely, the 
angiogram, which captures the spatial distribution of feature points. This technique 
is based on the extraction of information from a Delauney triangulation of these 
feature points. 

• We aim to present a unified framework to integrate multiple visual features 
including color histograms, shape angiograms and color angiograms with latent 
semantic indexing and demonstrate the efficacy of our image indexing scheme by 
comparing it with relevant image indexing techniques. 

• We aim to incorporate textual annotation with visual features in the proposed 
framework of image retrieval and indexing to further our efforts of negotiating the 
semantic gap. Relevance feedback and other techniques will also be integrated. 

The remaining part of this chapter is organized as follows. Section 2 introduces the 

feature extraction techniques applied in our approach. The theoretical background of 
latent semantic indexing and its application in textual information retrieval are 
detailed in Section 3. In Section 4, we present the preliminary results of our study of 
finding the latent correlation between features and semantics. Finally, Section 5 
summarizes the chapter and highlights some proposed future work. 
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2 Features 

In this section we present the feature extraction techniques that are applied in this 
research work. We propose to integrate a variety of visual features with the latent 
semantic indexing technique for image retrieval. These visual features include global 
and subimage color histograms, as well as angiograms. Angiograms can be used for 
shape-based and color-based representations, as well as for the spatial-relationships of 
image objects. Thus, a unified framework of image retrieval techniques is going to be 
generated in our proposed study. 



2.1 Color Histogram 

Color is a visual feature that is immediately perceived when looking at an image. 
Retrieval by color similarity requires that models of color stimuli are used, such that 
distances in the color space correspond to human perceptual distances between 
different colors. Moreover, color patterns must be represented in such a way that 
salient chromatic properties are captured. 

A variety of color models have been introduced, such as RGB, HSV, CIE, LUV and 
MTM. Humans perceive colors through hue, saturation, and brightness. Hue describes 
the actual wavelength of the color. Saturation indicates the amount of white light that 
is present in a color. Highly saturated colors, also known as pure colors, have no 
white light component. Brightness, which is also called intensity, value, or lightness, 
represents the intensity of color. Since the combination of hue, saturation and value 
reflects human perception of color, the HSV color model has been selected to be the 
basis for our color-based extraction approach. 

The color histogram is the most traditional and the most widely used way to 
represent color patterns in an image. It is a relatively efficient representation of color 
content and it is fairly insensitive to variations originated by camera rotation or 
zooming [1]. Also, it is fairly insensitive to changes in image resolution when images 
have quite large homogeneous regions, and insensitive to partial occlusions as well. 

In our study, the HSV color histogram is generated for each image on either the 
whole image level or the subimage level. On whole image level, a two-dimensional 
global histogram of both the hue component and saturation component is computed. 
Since the human perception of color depends mostly on hue and saturation, we ignore 
the intensity value component in our preliminary research, in order to simplify the 
computation. Each image is first converted from the RGB color space to the HSV 
color space. For each pixel of the resulting image, hue and saturation are extracted 
and each quantized into a 10-bin histogram. Then, the two histograms h and s are 
combined into one h X s histogram with 100 bins, which is taken to be the 
representing feature vector of each image. This is a vector of 100 elements, V = 

■ ■ ■ fioX’ where each element corresponds to one of the bins in the hue-saturation 
histogram. 
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On the subimage level, each image is decomposed into 5 subimages, which is 
illustrated by the sample image in Figure 1. Such an approach was used in [3], and is 
a step toward identifying the semcons [4] appearing in an image. Considering that it is 
very common to have the major object located in central position in the image, we 
have one subimage to capture the central region in each image, and the other four 
subimages cover the upper-left, upper-right, lower-left, and lower-right areas in the 
image. For each pixel of the resulting subimage, hue and saturation are extracted and 
each quantized into a 10-bin histogram. Then the two histograms h and s are again 
combined into one h x s histogram with 100 bins, which is taken to be the 
representing feature vector of each image. This is a vector of 100 elements, V = 
fy ■ ■ ■ fooY’ where each element again corresponds to one of the bins in the hue- 
saturation histogram. 




Fig. 1. Subimage Decomposition 



Since both global and subimage color histograms are formulated as a feature 
vector, it is very easy to use them as the input for latent semantic indexing. 



2.2 Angiogram 

In this section, we first provide some background concepts for Delaunay triangulation 
in computational geometry, and then present the geometric triangulation-based 
angiogram for encoding spatial correlation, which is invariant to translation, scale, 
and rotation. 

Let P = {p„ Pi, ■■■,P„} be a set of points in the two-dimensional Euclidean plane, 
namely the sites. Partition the plane by labeling each point in the plane to its nearest 
site. All those points labeled as p. form the Voronoi region V(p). V(p) consists of all 
the points x, which are at least as close to p. as to any other site: 

V(p) ^ {x-.\p,-x\<\ p. - x|, Vj ^ i } . 

Some of the points do not have a unique nearest site, however. The set of all points 
that have more than one nearest site form the Voronoi diagram V(P) for the set of 
sites. 

Construct the dual graph G for a Voronoi Diagram V(P) as follows: the nodes of G 
are the sites of V(P), and two nodes are connected by an arc if their corresponding 
Voronoi polygons share a Voronoi edge. In 1934, Delaunay proved that when the 
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dual graph is drawn with straight lines, it produces a planar triangulation of the 
Voronoi sites P, so called the Delaunay triangulation D(P). Each face of D(P) is a 
triangle, so called the Delaunay triangle. 

The spatial layout of a set of points can be coded through such an angiogram. One 
discretizes and counts the angles produced by the Delaunay triangulation of a set of 
unique feature points in some context, given the selection criteria of what the bin size 
will be and of which angles will contribute to the final angle histogram. An important 
property of our proposed angiogram for encoding spatial correlation is its invariance 
to translation, scale, and rotation. An 0(max(N, #bins)) algorithm is necessary to 
compute the angiogram corresponding to the Delaunay triangulation of a set of N 
points. 

The shape angiogram approach can be used for image object indexing, while the 
color angiogram can be used as a spatial color representation technique. 

In the shape angiogram approach, those objects that will be used to index the 
image are identified, and then a set of high-curvature points along the object 
boundary are obtained as the feature points. The Delaunay triangulation is performed 
on these feature points and thus the feature point histogram is computed by 
discretizing and counting the number of either the two largest angles or the two 
smallest angles in the Delaunay triangles. 

To apply the color angiogram approach, color features and their spatial 
relationship are extracted and then coded into the Delaunay triangulation. Each image 
is decomposed into a number of non-overlapping blocks. Each individual block is 
abstracted as a unique feature point labeled with its spatial location and feature 
values. The feature values in our experiment are dominant or average hue and 
saturation in the corresponding block. Then, all the normalized feature points form a 
point feature map for the corresponding image. For each set of feature points labeled 
with a particular feature value, the Delaunay triangulation is constructed and then the 
feature point histogram is computed by discretizing and counting the number of either 
the two largest angles or the two smallest angles in the Delaunay triangles. Finally, 
the image will be indexed by using the concatenated feature point histogram for each 
feature value. 



3 Latent Semantic Indexing 

In this section we describe an approach to automatic information indexing and 
retrieval, namely, latent semantic indexing (LSI). It is introduced to overcome a 
fundamental problem that plagues existing retrieval techniques that try to match 
words of queries with words of documents. The problem is that users want to retrieve 
on the basis of conceptual content, while individual words provide unreliable 
evidence about the conceptual meaning of a document. There are usually many ways 
to express a given concept. Therefore, the literal terms used in a user’s query may not 
match those of a relevant document. In addition, most words have multiple meanings 
and are used in different contexts. Hence, the terms in a user’s query may literally 
match the terms in documents that are not of any interest to the user at all. 
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In information retrieval these two problems are addressed as synonymy and 
polysemy. The concept synonymy is used in a very general sense to describe the fact 
that there are many ways to refer to the same object. Users in different contexts, or 
with different needs, knowledge, or linguistic habits will describe the same concept 
using different terms. The prevalence of synonyms tends to decrease the recall 
performance of the retrieval. By polysemy we refer to the general fact that most words 
have more than one distinct meaning. In different contexts or when used by different 
people the same term takes on varying referential significance. Thus the use of a term 
in a query may not necessarily mean that a document containing the same term is 
relevant at all. Polysemy is one factor underlying poor precision performance of the 
retrieval [5]. 

Latent semantic indexing tries to overcome the deficiencies of term-matching 
retrieval by treating the unreliability of observed term-document association data as a 
statistical problem. It is assumed that there exists some underlying latent semantic 
structure in the data that is partially obscured by the randomness of word choice with 
respect to retrieval. Statistical techniques are used to estimate this latent semantic 
structure, and to get rid of the obscuring noise. By semantic structure we mean the 
correlation structure in which individual words appear in documents; semantic 
implies only the fact that terms in a document may be taken as referents to the 
document itself or to its topic. 

The latent semantic indexing technique makes use of the singular value 
decomposition (SVD). We take a large matrix of term-document association data and 
construct a semantic space wherein terms and documents that are closely associated 
are placed near to each other. Singular value decomposition allows the arrangement 
of the space to reflect the major associative patterns in the data, and ignore the 
smaller, less important influences. As a result, terms that did not actually appear in a 
document may still end up close to the document, if that is consistent with the major 
patterns of association in the data. Position in the space then serves as a new kind of 
semantic indexing. Retrieval proceeds by using the terms in a query to identify a 
point in the semantic space, and documents in its neighborhood are returned as 
relevant results to the query. 

Latent semantic indexing is based on the fact that the term-document association 
can be formulated by using the vector space model, in which each document is 
encoded as a vector, where each vector component reflects the importance of a 
particular term in representing the semantics of that document. The vectors for all the 
documents in a database are stored as the columns of a single matrix. Latent semantic 
indexing is a variant of the vector space model in which a low-rank approximation to 
the vector space representation of the database is employed. That is, we replace the 
original matrix by another matrix that is as close as possible to the original matrix but 
whose column space is only a subspace of the column space of the original matrix. 
Reducing the rank of the matrix is a means of removing extraneous information or 
noise from the database it represents. Rank reduction is used in various applications 
of linear algebra and statistics as well as in image processing, data compression, 
cryptography, and seismic tomography. According to [6], latent semantic indexing 
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has achieved average or above average performance in several experiments with the 
TREC collections. 



3.1 The Vector-Space Model 

In the vector space model, a vector is used to represent each item or document in a 
collection. Each component of the vector reflects a particular concept, keyword, or 
term associated with the given document. The value assigned to that component 
reflects the importance of the term in representing the semantics of the document. 
Typically, the value is a function of the frequency with which the term occurs in the 
document or in the document collection as a whole [7]. 

A database containing a total of d documents described by t terms is represented as 
a t X d term-by-document matrix A. The d vectors representing the d documents form 
the columns of the matrix. Thus, the matrix element is the weighted frequency at 
which term i occurs in document j. The columns of^ are called the document vectors, 
and the rows of A are the term vectors. The semantic content of the database is 
contained in the column space of A, meaning that the document vectors span that 
content. We can exploit geometric relationships between document vectors to model 
similarity and differences in content. Meanwhile, we can also compare term vectors 
geometrically in order to identify similarity and differences in term usage. 

A variety of schemes are available for weighting the matrix elements. The element 
a., of the term-by-document matrix A is often assigned values as a.. = The factor 
g. is called the global weight, reflecting the overall value of term i as an indexing term 
for the entire collection. As one example, consider a very common term like image 
within a collection of articles on image retrieval. It is not important to include that 
term in the description of a document as all of the documents are known to be about 
image so a small value of the global weight g is appropriate. Global weighting 
schemes range from simple normalization to advanced statistics-based approaches 
[7]. The factor 1. is a local weight that reflects the importance of term i within 
document j itself Local weights range in complexity from simple binary values to 
functions involving logarithms of term frequencies. The latter functions have a 
smoothing effect in that high-frequency terms having limited discriminatory value are 
assigned low weights. 



3.2 Singular- Value Decomposition 

The singular value decomposition (SVD) is a dimension reduction technique which 
gives us reduced-rank approximations to both the column space and row space of the 
vector space model. The SVD also allows us to find a rank-k approximation to a 
matrix A with minimal change to that matrix for a given value of k [6]. The 
decomposition is defined as A = U X V^, where U is the t Xt orthogonal matrix having 
the left singular vectors of A as its columns, V is the d X d orthogonal matrix having 
the right singular vectors of A as its columns, and X is the t X d diagonal matrix 
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having the singular values O, > > ... > cr of the matrix A in order along its 

diagonal, where r = min{t, d). This decomposition exists for any given matrix^ [8]. 

The rank of the matrix A is equal to the number of nonzero singular values. It 
follows directly from the orthogonal invariance of the Frobenius norm that \\ A ||^ is 
defined in terms of those values, 




The first columns of matrix U are a basis for the column space of matrix A, 
while the first rows of matrix are a basis for the row space of matrix To create 
a rank-k approximation A^ to the matrix A, where k < we can set all but the k 
largest singular values of A to be zero. A classic theorem about the singular value 
decomposition states that the distance between the original matrix A and its rank-^ 
approximation is minimized by the approximation The theorem further shows how 
the norm of that distance is related to singular values of matrix A. It is described as 

= min |U-X|| : 



min 

rank{x)<k 



k+l 



+ ... + a: 



Here A^ = , where C/j is the t X k matrix whose columns are the first k 

columns of matrix U, Fj is the d X k matrix whose columns are the first k columns of 
matrix F, and is the kxk diagonal matrix whose diagonal elements are the k largest 
singular values of matrix H. 

How to choose the rank that provides optimal performance of latent semantic 
indexing for any given database remains an open question and is normally decided 
via empirical testing. For very large databases, the number of dimensions used 
usually ranges between 100 and 300. Normally, it is a choice made for computational 
feasibility as opposed to accuracy. Using the SVD to find the approximation Hj, 
however, guarantees that the approximation is the best that can be achieved for any 
given choice of k. 



3.3 Similarity Measure 

In the vector space model, a user queries the database to find relevant documents, 
using the vector space representation of those documents. The query is also a set of 
terms, with or without weights, represented by using a vector just like the documents. 
It is likely that many of the terms in the database do not appear in the query, meaning 
that many of the query vector components are zero. Meanwhile, even though some of 
the terms in the query and in the documents are common, they may be used to refer to 
different concepts. Considering the general problems of synonymy and polysemy, we 
are trying to reveal the underlying semantic structure of the database and thus 
improve the query performance by using the latent semantic indexing technique. A 
query can be issued after the SVD has been performed on the database and an 
appropriate lower rank approximation has been generated. The matching process is to 
find the documents most similar to the query in the use and weighting of terms. In the 
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vector space model, the documents selected are those geometrically closest to the 
query in the transformed semantic space. 

One common measure of similarity is the cosine of the angle between the query 
and document vectors. If the term-by-document matrix A has columns a., j = 1, 2, ..., 
d, those d cosines are computed according to the following formula 



t 




fory = 1, 2, . . ., J, where the Euclidean vector norm || x is defined by 




for any t-dimensional vector x. Because the query and document vectors are typically 
sparse, the dot product and norms are generally inexpensive to compute. Furthermore, 
the document vector norms \\aj\\^ need to be computed only once for any given term- 
by-document matrix. Note that multiplying either a. or g by a constant does not 
change the cosine value, thus, we may scale the document vectors or the queries by 
any convenient factor. 

With any given document database and user’s query, we can always generate the 
term-by-document matrix and then apply the singular value decomposition to this 
matrix. We hope to choose a good lower-ranked approximation after the SVD and use 
this transformed matrix to construct the semantic space of the database. Then, the 
query process will be to locate those documents geometrically closest to the query 
vector in the semantic space. 

The latent semantic indexing technique has been successfully applied to 
information retrieval, in which it shows distinctive power of finding the latent 
correlation between terms and documents. This inspires us to attempt to borrow this 
technique from traditional information retrieval and apply it to visual information 
retrieval. We hope to make use of the power of latent semantic indexing to reveal the 
underlying semantic nature of visual contents, and thus to find the correlation 
between visual features and semantics of visual documents or objects. We will 
explore this approach further by correlating low-level feature groups and high-level 
semantic clusters, hoping to figure out the semantic nature behind those visual 
features. Some preliminary experiments have been conducted and the results show 
that integrating latent semantic indexing with content-based retrieval is a promising 
approach. Details of these experiments are presented in the next section. 
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4 Finding Latent Correlation 

between Visual Features and Semantics 

Existing management systems for image collections and their users are typically at 
cross-purposes. While these systems normally retrieve images based on low-level 
features, users usually have a more abstract notion of what will satisfy them. Using 
low-level features to correspond to high-level abstractions is one aspect of the 
semantic gap [9] between content-based system organization and the concept-based 
user. Sometimes, the user has in mind a concept so abstract that he himself doesn’t 
know what he wants until he sees it. At that point, he may want images similar to 
what he has just seen or can envision. Again, however, the notion of similarity is 
typically based on high-level abstractions, such as activities taking place in the image 
or evoked emotions. Standard definitions of similarity using low-level features 
generally will not produce good results. 

In reality, the correspondence between user-based semantic concepts and system- 
based low-level features is many-to-many. That is, the same semantic concept will 
usually be associated with different sets of image features. Also, for the same set of 
image features, different users could easily find dissimilar images relevant to their 
needs, such as when their relevance depends directly on an evoked emotion. 

In this section, we present the results of a series of experiments that seeks to 
transform low-level features to a higher level of meaning. This study concerns a 
technique, latent semantic analysis, which has been used for information retrieval for 
many years. In this environment, this technique determines clusters of co-occurring 
keywords, sometimes, called concepts, so that a query which uses a particular 
keyword can then retrieve documents perhaps not containing this keyword, but 
containing other keywords from the same cluster. In this preliminary study, we 
examine the use of this technique for content-based image retrieval to find the 
correlation between visual features and semantics. 



4.1 The Effects of Latent Semantic Indexing, Normalization, and Weighting 
for Glohal and Snhimage Color Histograms 

In this and the next section, we show the improvement that latent semantic analysis, 
normalization, and weighting can give to two simple and straightforward image 
retrieval techniques, both of which use standard color histograms. For our 
experiments, we use a database of 50 JPEG images, each of size 192 x 128. This 
image collection consists of ten semantic categories of five images each. The 
categories consist of: ancient towers, ancient columns, birds, horses, pyramids, 
rhinos, sailing scenes, skiing scenes, sphinxes, and sunsets. 

Our first approach uses global color histograms. Each image is first converted from 
the RGB color space to the HSV color space. For each pixel of the resulting image, 
hue and saturation are extracted and each quantized into a 10-bin histogram. Then the 
two histograms h and s are combined into one hxs histogram with 100 bins, which is 
the representing feature vector of each image. This is a vector of 100 elements, V = 
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[/;> fi'f}'--- /;oJ ’ where each element corresponds to one of the bins in the hue- 
saturation histogram. 

We then generate the feature-image-matrix, A = [V„... which is 100 X 50. 
Each row corresponds to one of the elements in list of features and each column is the 
entire feature vector of the corresponding image. This matrix is written into a file so 
the computation is done only once. The matrix will be retrieved from the file during 
the query process. 

A singular value decomposition is then performed on the feature-image-matrix. 
The result comprises three matrices, U, S and V, where A = USV^. The dimensions of 
U are 100 X 100, S is 100 X 50, and V is 50 X 50. The rank of matrix S, and thus the 
rank of matrix A, in our case is 50. Therefore, the first 50 columns of U spans the 
column space of A and all the 50 rows in spans the row space of A. S is a diagonal 
matrix of which the diagonal elements are the singular values of A. To reduce the 
dimensionality of the transformed latent semantic space, we use a rank-k 
approximation, A^, of the matrix A, for k = 34, which worked better than other values 
tried. This is defined by A,^ = U|^S,,Vj. The dimension of A^ is the same as A, 100 by 
50. The dimensions of U^, S^, and are 100 X 34, 34 X 34, and 50 X 34, respectively. 

The query process in this approach is to compute the distance between the 
transformed feature vector of the query image, q, and that of each of the 50 images in 
the database, d. This distance is defined as dist(q, = q’^d / ||q|| ||d||, where ||q|| and 
||d|| are the norms of those vectors. The computation of ||d|| for each of the 50 images 
is done only once and then written into a file. Using each image as a query, in turn, 
we find the average sum of the positions of all of the five correct answers. Note that 
in the best case, where the five correct matches occupy the first five positions, this 
average sum would be 15, whereas in the worst case, where the five correct matches 
occupy the last five positions, this average sum would be 240. A measure that we use 
of how good a particular method is defined as, 

4 g _ average- sum ^ 
measure - of — goodness = ^ . 

We note that in the best case, this measure is equal to 1, whereas in the worst case, 
it is equal to 0. 

This approach was then compared to one without using latent semantic analysis. 
We also wanted to see whether the standard techniques of normalization and term 
weighting from text retrieval would work in this environment. 

The following normalization process will assign equal emphasis to each 
component of the feature vector. Different components within the vector may be of 
totally different physical quantities. Therefore, their magnitudes may vary drastically 
and thus bias the similarity measurement significantly. One component may 
overshadow the others just because its magnitude is relatively too large. For the 
feature image matrix A=[V„Vj, we have A^ which is the f component in 

vector Vj. Assuming a Gaussian distribution, we can obtain the mean jx. and standard 
deviation a. for the f component of the feature vector across the whole image 
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database. Then we normalize the original feature image matrix into the range of [-1,1] 
as follows, 



Ai, i = ■ 



< 7 , 



It can easily be shown that the probability of an entry falling into the range of [-1, 
1] is 68%. In practice, we map all the entries into the range of [-1, 1] by forcing the 
out-of-range values to be either -1 or 1. We then shift the entries into the range of [0, 
1] by using the following formula 

w ^ G f + 1 

A,._, ^ . 

After this normalization process, each component of the feature image matrix is a 
value between 0 and 1, and thus will not bias the importance of any component in the 
computation of similarity. 

One of the common and effective methods for improving full-text retrieval 
performance is to apply different weights to different components [7]. We apply these 
techniques to our image environment. The raw frequency in each component of the 
feature image matrix, with or without normalization, can be weighted in a variety of 
ways. Both global weight and local weight are considered in our approach. A global 
weight indicates the overall importance of that component in the feature vector across 
the whole image collection. Therefore, the same global weighting is applied to an 
entire row of the matrix. A local weight is applied to each element indicating the 
relevant importance of the component with its vector. The value for any component 
A. . is thus L(i, j)G(i), where L(i, j) is the local weighting for feature component i in 
image j, and G(i) is the global weighting for that component. 

Common local weighting techniques include term frequency, binary, and log of 
term frequency, whereas common global weighting methods include Normal, Gfldf, 
Idf, and Entropy. Based on previous research it has been found that log of term 
frequency helps to dampen effects of large differences in frequency and thus has the 
best performance as a local weight, whereas Entropy is the appropriate method for 
global weighting [7]. 

The entropy method is defined by having a component global weight of. 






pij \og{pij) 



log(number _of _ images) 



where pij ^ tf.j / gf. is the probability of that component, tf.j is the raw frequency of 
component A.., and gf. is the global frequency, i.e., the total number of times that 
component i occurs in the whole collection. 

The global weights give less emphasis to those components that occur frequently 
or in many images. Theoretically, the entropy method is the most sophisticated 
weighting scheme and it takes the distribution property of feature components over 
the image collection into account. 

We conducted similar experiments for these four cases: 

1. Global color histograms, no normalization, no term weighting, no latent-semantic 
indexing (raw data) 
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2. Global color histograms, normalized and term-weighted, no latent semantic 
indexing 

3. Global color histograms, no normalization, no term-weighting, with latent 
semantic indexing 

4. Global color histograms, normalized and term-weighted, with latent semantic 
indexing 

The results are shown in Table 1, where each table entry is a measure-of-goodness 
of the corresponding technique. We note that the improvements under LSI don’t seem 
very large. This is an artifact of the small size and nature of our database and the fact 
that any of the techniques mentioned work well. It is, however, an indication that LSI 
is a technique worthy of further study in this environment. 



Table 1. Results for Global Color Histogram and Color Angiogram Representations 





Global Color 
Histogram 


Color 

Angiogram 


Raw Data 


0.9257 


0.9508 


Raw Data with LSI 


0.9377 


0.9556 


Normalized and Weighted Data 


0.9419 


0.9272 


Normalized and Weighted Data 
with LSI 


0.9446 


0.9284 



Thus, for the global histogram approach, using normalized and weighted data or 
using latent semantic indexing with the raw data improves performance, while using 
both techniques is even better. 

Our next approach uses sub-image matching in conjunction with color histograms. 
Each image is first converted from the RGB color space to the HSV color space. Each 
image is decomposed into 5 overlapping subimages, as shown in Figure 1. For the 50 
images in our case, 250 subimages will be used in the following feature extraction 
process. For each pixel of the resulting image, hue and saturation are extracted and 
each quantized into a 10-bin histogram. Then the two histograms h and s are 
combined into one h y. s histogram with 100 bins, which is the representing feature 
vector of each image. This is a vector of 100 elements, V = [/,, Tj, /j, . . where 

each element corresponds to one of the bins in the hue-saturation histogram. 

We then generate the feature-subimage-matrix, A = [V„...,Vj 5 „], which is 100 x 
250. Each row corresponds to one of the elements in the feature vector and each 
column is the whole feature vector of the corresponding subimage. This matrix is 
written into a file so the computation is done only once. The matrix will be retrieved 
from the file during the query process. 

A singular value decomposition is then performed on the feature-subimage-matrix. 
The result comprises three matrices, U, S and V, where A = USV^. The dimensions of 
U are 100 x 100, S is 100 x 250, and V is 250 x 250. The rank of matrix S, and thus 
the rank of matrix A, in our case is 100. Therefore, the first 100 columns of U spans 
the column space of A and all the 100 rows in V ^ spans the row space of A. S is a 
diagonal matrix of which the diagonal elements are the singular values of A. To 
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reduce the dimensionality of the transformed latent semantic space, we use a rank-A: 
approximation, A^, of the matrix A, for k = 55. This is defined by A,^ = The 

dimension of A^ is the same as A, 100 by 50. The dimensions of U^, S^, and are 

100 X 55, 55 X 55, and 250 x 55, respectively. 

The first step of the query process in this approach is to compute the distance 
between the transformed feature vector of each subimage of the query image, q, and 
that of each of the 250 images in the database, d. This distance is defined as dist((\. A) 
= q’^d / ||q|| ||d||, where ||q|| and ||d|| are the norms of those vectors. The computation 
of ||d|| for each of the 250 subimages is done only once and then written into a file. 

With respect to the query image and each of the 50 database images, we now have 
the distances between each pair of subimages by the previous step. These distance 
values dist((\^,A) are then combined into one distance value between these two images 
in an approach similar to the computation of Euclidean distance. Given a query image 
q, with corresponding subimages q„ ..., q,, and a candidate database image d, with 
corresponding subimages dj, ..., d,, we define, 

dist{q, — ^ 




This approach was again compared to one without using latent semantic analysis. 
Each image is decomposed into five subimages which are then represented by their 
hue-saturation histograms V. Then the cosine measure between corresponding 
subimages is computed and used as the similarity metric. We thus have the distance 
between the query image and each of the 50 database images. These similarity values 
are then combined into one similarity measure between these two images. Given a 
query image q and a candidate image d in the database, we define. 



dist{q, d) = — ^ sim{qi , d^ ) 



Using each image as a query, we again find the average sum of the positions of all 
of the five correct answers. Now, without using latent semantic analysis, using the 
measure previously introduced, the result is 0.9452, while the use of latent semantic 
analysis brings this measure to 0.9502. 

We also did a similar experiment where dist{q, d) weighted the center subimage 
twice as much as the peripheral subimages. Using the same measure, the results of 
these experiments are 0.9475 for the experiment without using latent semantic 
analysis and 0.9505 for that using latent semantic analysis. Therefore, latent semantic 
indexing does improve the retrieval performance for both global and subimage color 
histogram based retrieval. Comparison of global and subimage results shows that 
subimage provides better performance than global color histogram either with or 
without latent semantic indexing. 
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4.2 The Effects of Latent Semantic Indexing, Normalization, and Weighting 
for Color Angiograms 

Our next approach performs similar experiments utilizing our previously formulated 
approach of color angiograms [10]. This is a novel spatial color indexing scheme 
based on the point feature map obtained by dividing an image evenly into a number 
of M*N non-overlapping blocks with each individual block abstracted as a unique 
feature point labeled with its spatial location, dominant hue, and dominant saturation. 

For our experiments, we divide the images into 8*8 blocks, have 10 quantized hue 
values and 10 quantized saturation values, count the two largest angles for each 
Delauney triangle, and have an angiogram bin of 5°. Our vector representation of an 
image thus has 720 elements: 36 hue bins for each of 10 hue ranges and 36 saturation 
bins for each of 10 saturation ranges. We use the same approach to querying as in the 
previous section. 

We conducted similar experiments for these four cases: 

1. Color angiograms, no normalization, no term weighting, no latent-semantic 
indexing (raw data) 

2. Color angiograms, normalized and term- weighted, no latent semantic indexing 

3. Color angiograms, no normalization, no term-weighting, with latent semantic 
indexing 

4. Color angiograms, normalized and term- weighted, with latent semantic indexing 
with the results shown in Table 1. 

From these results, one notices that our angiogram method is better than the 
standard global color histogram, which is consistent with our previous results [10,1 1]. 
One also notices that latent semantic indexing improves the performance of this 
method. However, it seems that normalization and weighting has a negative impact 
on query performance. We more thoroughly examined the impact of these techniques 
and derived the data shown in Table 2. 



Table 2. More Detailed Results for Color Angiogram Representation 



Color Angiogram 


Raw Data with LSI 


0.9556 


Normalized Data with LSI 


0.9476 


Weighted Data with LSI 


0.9529 


Normalized and Weighted Data with LSI 


0.9284 



The impact of normalization is worse than that of weighting. Normalization is a 
compacting process which transforms the original feature image matrix (the 
angiogram elements) to the range [0, 1]. Now, the feature image matrix in this case is 
a sparse matrix with many O’s, some small integers, and a relatively small number of 
large integers. We believe that these large integers represent the discriminatory power 
of the angiogram and that the compacting effect of normalization weakens their 
significance. Local log-weighting also has a compacting effect. Since both the local 
and global weighting factors lie between 0 and 1, the transformed matrix always has 
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smaller values than the original one, even though no normalization is applied. Thus, 
normalization and weighting don’t help improve the performance, but actually makes 
it worse. 



4.3 Utilizing Image Annotations 

We conducted various experiments to determine whether image annotations could 
improve the query results of our various techniques. The results indicate that they 
can. 

For both the global color histogram and color angiogram representation, we 
appended an extra 15 elements to each of these vectors (called category hits) to 
accommodate the following 15 keywords associated with these images: sky, sun, 
land, water, boat, grass, horse, rhino, bird, human, pyramid, column, tower, sphinx, 
and snow. Thus, the feature vector for the global histogram representation now has 
1 15 elements (100 visual elements and 15 textual elements), while the feature vector 
for the color angiogram representation now has 735 elements (720 visual elements 
and 15 textual elements). Each image is annotated with appropriate keywords and the 
area coverage of each of these keywords. For instance, one of the images is annotated 
with sky(0.55), sun(0.15), and water(0.30). This is a very simple model for 
incorporating annotation keywords. One of the strengths of the LSA technique is that 
it is a vector-based method that helps us to integrate easily different features into one 
feature vector and to treat them just as similar components. Hence, ostensibly, we can 
apply the normalization and weighting mechanisms introduced in the previous 
sections to the expanded feature image matrix without any concern. 

For the global color histogram representation, we start with an image feature 
matrix of size 115 x 50. Then, using the SVD, we again compute the rank 34 
approximation to this matrix, which is also 115 x 50. For each query image, we fill 
bits 101 through 115 with O’s. We also fill the last 15 rows of the transformed image 
feature matrix with all O’s. Thus, for the querying, we do not use any annotation 
information. We also note, that as before, we apply normalization and weighting, as 
this improves the results, which are shown in Table 3. The first two results are from 
Table 1, while the last result shows how our technique of incorporating annotation 
information improves the querying process. 



Table 3. Global Color Histograms with Annotation Information 



Global Color Histogram 


Normalized and Weighted Data 


0.9419 


Normalized and Weighted Data with LSI 


0.9446 


Normalized and Weighted Data with LSI and 


0.9465 


Annotation Information 





For the color angiogram representation, we start with an image feature matrix of 
size 735 x 50. Then, using the SVD, we again compute the rank 34 approximation to 
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this matrix, which is also 735 x 50. For each query image, we fill bits 721 through 
735 with O’s. We also fill the last 15 rows of the transformed image feature matrix 
with all O’s. Thus, for the querying, we do not use any annotation information. We 
also note that as before, we do not apply normalization and weighting, as this 
improves the results, which are shown in Table 4. The first two results are from Table 
1, while the last result shows how our technique of incorporating annotation 
information improves the querying process. 



Table 4. Global Color Histograms with Annotation Information 



Color Angiogram 


Raw Data 


0.9508 


Raw Data with LSI 


0.9556 


Raw Data with LSI and Annotation Information 


0.9590 



Note that annotations improve the query process for color angiograms, even 
though we do not normalize the various vector components, nor weight them. This is 
quite surprising, given that the feature image vector consists of 720 visual elements, 
which are relatively large integers, and only 15 annotation elements, which are in the 
range [0,1]. 



5 Conclusion and Future Work 

In this chapter we proposed image retrieval schemes that incorporate multiple visual 
feature extraction represented by color histograms and color angiograms. Features are 
extracted on both whole image level and subimage level to better capture salient 
object descriptions. To negotiate the gap between low-level visual features and high- 
level concepts, latent semantic indexing is applied and integrated with these content- 
based retrieval techniques in a vector space model. Correlation between visual 
features and semantics are explored. Annotations are also fused into the feature 
vectors to improve the efficiency and effectiveness of the retrieval process. 

The results presented in the previous section are quite interesting and are certainly 
worthy of further study. Our hope is that latent semantic analysis will find that 
different image features co-occur with similar annotation keywords, and consequently 
lead to improved techniques of semantic image retrieval. We are currently 
experimenting with the integration of shape angiograms, color angiograms, and 
structural features with latent semantic indexing and developing a unified framework 
to accommodate multiple features and their representation. We will further test and 
benchmark this integrated image retrieval framework over various large image 
databases, along with tuning the latent semantic indexing scheme to achieve optimal 
performance with highly reduced dimensionality. We will further our study of image 
semantics and incorporation of textual annotations and explore the correlation 
between visual feature groups and semantic clusters. We also consider applying 
various clustering techniques and use the cluster identifier in place of annotation 
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information. Analyzing the patterns of user interaction, either in the query process or 
in the browsing process, is another interesting research topic. Making use of 
relevance feedback to infer user preference should also be incorporated to elevate the 
retrieval performance. Finally, considering that the image archives on the internet are 
normally associated with other sources of information such as captions, titles, labels, 
and surrounding texts, we also propose to extend the application of the latent 
semantic indexing technique to analyze the structure of different types of visual and 
hypermedia documents. 
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Abstract. We deal with the problem of rule interpolation and rule ex- 
trapolation for fuzzy and possibilistic systems. Such systems are used 
for representing and processing vague linguistic If-Then-rules, and they 
have been increasingly applied in the field of control engineering, pattern 
recognition and expert systems. The methodology of rule interpolation is 
required for deducing plausible conclusions from sparse (incomplete) rule 
bases. For this purpose the well-known fuzzy inference mechanisms have 
to be extended or replaced by more general ones. The methods proposed 
so far in the literature for rule interpolation are mainly conceived for the 
application to fuzzy control and miss certain logical characteristics of 
an inference. This serves as a motivation for looking for a more flexible 
method that is superior to the proposed ones with respect to its general 
applicability to fuzzy as well as to possibilistic systems. 

First, a set of axioms is proposed. With this, a definition is given for 
the notion of interpolation, extrapolation, linear interpolation and lin- 
ear extrapolation of fuzzy rules. The axioms include all the conditions 
that have been of interest in the previous attempts and others which 
either have logical characteristics or try to capture the linearity of the 
interpolation. A new method for linear interpolation and extrapolation 
of compact fuzzy quantities of the real line is suggested and analyzed in 
the spirit of the given definition. The method is extended to non-linear 
interpolation and extrapolation. Finally, the method is extended to the 
general case, where the input space is n-dimensional, by using the con- 
cept of aggregation operators. 

Keywords: Knowledge-based systems. Fuzzy sets. Sparse rule base. In- 
ference, Interpolation/extrapolation of fuzzy rules. Approximate reason- 
ing, Expert system, Fuzzy control 



1 Summary of the Topic 

Fuzzy set theory is a formal framework for modeling input-output relations, for 
which only vague/linguistic information is available to describe them. The fact 
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that especially complex systems can often be described in a more transparent and 
efficient way than by physical-mathematical models, has led to a broad spectrum 
of applications of fuzzy set based methods. Expert and data base systems, op- 
erations research, image processing and control engineering are examples, where 
fuzzy systems have successfully been applied. 

Historically, fuzzy systems were constructed from linguistic If-Then-rules of 
a human expert. Recently, learning techniques have increasingly been developed 
and applied to the construction of fuzzy If-Then-rules from numerical sample 
data. The major advantages of fuzzy models are their modularity (each rule can 
be designed separately) and the semantics they rely on, by which numerical as 
well as subjective, linguistic information can be integrated within one system. 
For both methods of constructing rule bases it can happen that for special input 
parameters no rule is specified. In the case of learning techniques it may happen 
that the sample data do not sufficiently represent input parameters which only 
occur infrequently. In the case of asking a human expert, an incomplete rule 
base can be the consequence of missing experience for particular system con- 
figurations. Another reason for an incomplete rule base may be the following: 
Sometimes human experts are only able (or willing) to state explicit rules for 
prototypical system configurations. In this case rules for different configurations 
have to be derived from the prototypical ones by analogue reasoning. It should 
be pointed out that knowledge can be represented by stating only prototypical 
rules and applying analogue reasoning in an especially transparent and efficient 
way. 

In any case rule interpolation may be considered as an “inference technique” 
for fuzzy rule bases for which the premises do not cover the whole input space. 
Of course, any inference mechanism has to satisfy certain logical criteria. It may 
be said that the methods proposed so far in the literature are mainly conceived 
for the application to fuzzy control and that they are not appropriate for ap- 
plications where the logical aspects of approximate reasoning are intrinsically 
important. All this serves as a motivation for looking for a more flexible method 
in this paper which is superior to the proposed ones with respect to its general 
applicability to fuzzy as well as to possibilistic systems. 

After an axiomatic treatment of linear (and non-linear) interpolation and ex- 
trapolation, we introduce a new method for linear interpolation of compact fuzzy 
quantities of K.. We extend it to extrapolation, to non-linear (e.g., quadratic) 
interpolation and extrapolation and a corresponding defuzzification method is 
proposed. Finally, we extend the method to the general n-dimensional input 
space case. The details can be found in [4] and [5]. Only the axiomatic basis is 
presented below: 



1.1 Conditions on Rule Interpolation/Extrapolation 

We postulate a set of axioms and a definition for interpolation, extrapolation, 
linear interpolation and linear extrapolation of fuzzy sets. 




Inference in Rule-Based Systems by Interpolation and Extrapolation 



55 



In the following let X denote an input space and y an output space. Denote 
the set of valid fuzzy subsets of X and y by T-^;{X) and Tv{y), respectively. 
Further, let us consider a rule-base TZ of m rules of the form 

It X = A, Then Y = B, 

where Ai G B{X) and Bi G .B(y) for all 1 < i < m. Formally a rule interpolation 
I : Xy(X) — >■ Xy(y) is a mapping, which assigns to an observation A G T^{X) 
an interpolating (plausible) conclusion X{A) G Xv(y). If the rule interpolation 
I is expected to behave as an inference mechanism in the sense of approximate 
reasoning, at least the following conditions have to be satisfied: 

[10] Validity of the conclusion 

The conclusion should be a fuzzy subset of the universe y with a valid 
membership function. This means that “membership functions” like in 
Figure 1 are not allowed. We felt necessary to postulate this, since some of 
the existing methods do not satisfy even this very elementary condition. 
Usually, further conditions for validity are required too: For example, 
when a method is restricted to the use of e.g. trapezoidal membership 
functions, then the result may be expected to be trapezoidal too. The 
normality of the quantities is frequently supposed too. 




Fig. 1. Invalid “membership functions” 

[11] Compatibility with the rule-base 

For all i G {!,..• ,m} and all A G IFv(X) it follows from A = Ai that 
Y(A) = Bi. This condition is the modus ponens in logic. 

The following condition is essential for any logical inference. 

[12] Monotonicity condition 

If A* G T\i{X) is more specific than A G X(X), then I {A*) is more 
specific than X{A), i.e., for all A, A* G IFy(X) the inequality A* Q A 
implies the inequality 2 (A*) C 2{A). 

In addition to the basic properties [10], [II] and [12] also “smoothness” condi- 
tions on the mapping X are of interest, because in many applications “similar” 
observations are expected to induce “similar” conclusions. In order to make this 
property more precise, adequate concepts like “continuity” for mappings which 
map fuzzy subsets to fuzzy subsets, are required. Let dx and dy be metrics on 
Ty{X) and Xy{y) respectively. 
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[13] Continuity condition 

For e > 0 there exists i5 > 0 such that if A, A* G and dx{A, A*) < 

S then for the corresponding conclusions we have dy{I{A),I{A*)) < e. 

If U (resp. n) denotes union (resp. intersection) of fuzzy subsets (with appropri- 
ate subscripts if other operations than the pointwise maximum (resp. minimum) 
of the membership functions are considered) then it is natural to require the fol- 
lowing two conditions. [14] says that it is worth putting together all the available 
information first and drawing the conclusion afterwards rather than concluding 
from each piece of information separately and then putting together the conclu- 
sions. [15] is just the dual of [14] . 

[14] I{A Cix A*) C I{A) fly I{A*), whenever A C\x A* has valid membership 
function. 

[15] X{A L>x ^*) 3 21(3) Uy X{A*) whenever A L>x A* has valid membership 
function. 

In the case of linear interpolation/extrapolation in a fully-ordered space it is 
widely accepted to choose two rules for the observation to be the basis of the 
interpolation/extrapolation. For a linear interpolation/extrapolation it is quite 
natural to postulate the following axioms: (For any fuzzy quantity A and real 
number c, denote by 3 -|- c the “shifted” fuzzy quantity defined by ^a+c{x) = 
Ijla{x - c).) 

[16] Linearity principle 

If X = y = M, I(3i) = 3i-|-c and 1 ( 32 ) = 32 -I- c then for any observation 
3 for which the antecedents of the basis of interpolation/extrapolation are 
3i and 32 we should have X{A) = 3 -I- c. 

For c = 0 the linearity principle reduces to the 

[16*] Identity principle 

If 3 = 31 = R, 21(3i) = 3i and 1(32) = 32 then for any observation 3 
for which the antecedents of the basis of interpolation/extrapolation are 
3i and 32 we should have X{A) = 3. 

Note that axioms [16] and [16*] are meaningful only if 3 = = ®.. 

In the case of linear interpolation (resp. extrapolation) in a fully-ordered 
space it is widely accepted to choose the basis of the interpolation (resp. extrap- 
olation) in such a way that the observation lies “in between” the antecedents 
of the chosen rules (resp. the second antecedent lies “in between” the first an- 
tecedent and the observation). Then, of course, the corresponding conclusion 
is expected to lie “in between” the consequences of the antecedents (resp. the 
consequence of the second antecedent is expected to lie “in between” the conse- 
quence of the first antecedent and the conclusion). This leads to axiom 

[17] Preserving “in between” 

If the observation 3 is in between 3^ and Aj (resp. Aj is in between 3 
and Ai) then 1(3) should be in between X{Ai) and X{Aj) (resp. X{Aj) 
should be in between 1(3) and 1(3^)). 
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Since the choice of the relation “in between” may not be natural, this axiom is 
not included in the following definition. 

Definition 1. We call a mapping X : iFv(A’) — >■ interpolation/ extrapola- 
tion if it satisfies axioms [I0HI5], and in the case = M we call it linear 

interpolation/ extrapolation if in addition it satisfies [16] (and hence [16*]). We 
call I D- decomposable (resp. \J- decomposable) if axiom [14] (resp. [15]) holds 
with equality instead of subsethood. An interpolation/extrapolation which is 
both ri-decomposable and U-decomposable is called decomposable. 
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Abstract. We study the problem of allocating optical bandwidth to 
sets of communication requests in all-optical networks that utilize Wave- 
length Division Multiplexing (WDM). WDM technology establishes com- 
munication between pairs of network nodes by establishing transmitter- 
receiver paths and assigning wavelengths to each path so that no two 
paths going through the same hber link use the same wavelength. Optical 
bandwidth is the number of distinct wavelengths. Since state-of-the-art 
technology allows for a limited number of wavelengths, the engineering 
problem to be solved is to establish communication between pairs of 
nodes so that the total number of wavelengths used is minimized; this is 
known as the wavelength routing problem. 

In this paper we survey recent advances in bandwidth allocation in tree- 
shaped WDM all-optical networks. We present hardness results and 
lower bounds for the general problem and the special case of symmetric 
communication. We also survey various techniques that have been de- 
veloped recently, and explain how they can be used to attack the prob- 
lem. First, we give the main ideas of deterministic greedy algorithms 
and study their limitations. Then, we show how to use various ways 
and models of wavelength conversion in order to achieve almost optimal 
bandwidth utilization. Finally, we show that randomization can help to 
improve the deterministic upper bounds. 



1 Introduction 

Optical fiber is rapidly becoming the standard transmission medium for back- 
bone networks, since it can provide the required data rate, error rate and delay 
performance necessary for high speed networks of next generation[18,37]. How- 
ever, data rates are limited in opto-electronic networks by the need to convert 
the optical signals on the fiber to electronic signals in order to process them at 
the network nodes. Although electronic parallel processing techniques are capa- 
ble, in principle, to meet future high data rate requirements, the opto-electronic 
conversion is itself expensive. Thus, it appears likely that, as optical technology 
improves, simple optical processing will remove the need for opto-electronic con- 
version. Networks using optical transmission and maintaining optical data paths 
through the nodes are called all-optical networks. 
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Optical technology is not yet as mature as conventional technology. There 
are limits as to how sophisticated optical processing at each node can be done. 

Multiwavelength communication [18,37] is the most popular communication 
technology used on optical networks. Roughly speaking, it allows to send differ- 
ent streams of data on different wavelengths along an optical fiber. Multiwave- 
length communication is implemented through Wavelength Division Multiplexing 
(WDM). WDM takes all data streams traveling on an incoming link and route 
each of them to the right outgoing link, provided that each data stream travels 
on the same wavelength on both links. 

In a WDM all-optical network, once the data stream has been transmitted 
as light, it continues without conversion to electronic form until it reaches its 
destination. For a packet transmission to occur, a transmitter at the source 
must be tuned to the same wavelength as the receiver at the destination for the 
duration of the packet transmission and no data stream collision may occur at 
any fiber. 

We model the underlying fiber network as a directed graph, where vertices 
are the nodes of the network and links are optical fibers connecting nodes. Com- 
munication requests are ordered pairs of nodes, which are to be thought of as 
transmitter-receiver pairs. WDM technology establishes connectivity by finding 
transmitter-receiver directed paths and assigning a wavelength (color) to each 
path, so that no two paths going through the same link use the same wavelength. 

Optical bandwidth is the number of available wavelengths. Optical bandwidth 
is a scarce resource. State-of-the-art technology allows some hundreds wave- 
lengths per fiber in the laboratory, even less in manufacturing, and there is no 
anticipation for dramatic progress in the near future. At the state of the art there 
is no WDM all-optical network that uses the optical bandwidth in an efficient 
way. However, for a realistic use of WDM all-optical networks for long distance 
communication networks, it seems necessary a significant progress in the proto- 
cols for allocation of the available bandwidth. Thus, the important engineering 
problem to be solved is to establish communication between pairs of nodes so 
that the total number of wavelengths used is minimized; this is known as the 
wavelength routing problem [1,30]. 

Given a pattern of communication requests and a corresponding path for each 
request, we define the load of the pattern as the maximum number of requests 
that traverse any fiber of the network. For tree networks, the load of a pattern 
of communication requests is well defined, since transmitter-receiver paths are 
unique. Clearly, for any pattern of requests, its load is a lower bound on the 
number of necessary wavelengths. 

Theoretical work on optical networks mainly focuses on the performance of 
wavelength routing algorithms on regular networks using oblivious (predefined) 
routing schemes. We point out the pioneering work of Pankaj [30] who consid- 
ered shuffle exchange, De Bruijn, and hypercubic networks. Aggarwal et al. [1] 
consider oblivious wavelength routing schemes for several networks. Raghavan 
and Upfal in [33] consider mesh-like networks. Aumann and Rabani [7] improve 
the bounds of Raghavan and Upfal for mesh networks and also give tight re- 
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suits for hypercubic networks. Rabani in [31] gives almost optimal results for 
the wavelength routing problem on meshes. 

These topologies reflect architectures of optical computers rather than wide- 
area networks. For fundamental practical reasons, the telecommunication indus- 
try does not deploy massive regular architectures: backbone networks need to 
reflect irregularity of geography, non-uniform clustering of users and traffic, hi- 
erarchy of services, dynamic growth, etc. In this direction, Raghavan and Upfal 
[33], Aumann and Rabani [7], and Bermond et al. [10], among other results, 
focus on bounded-degree networks and give upper and lower bounds in terms of 
the network expansion. 

However, wide-area multi wavelength technology is expected to grow arround 
the evolution of current networking principles and existing fiber networks. These 
are mainly SONET (Synchronous Optical Networking Technology) rings and 
trees [18,37]. In this sense, even asymptotic results for expander graphs do not 
address the above telecommunications scenario. 

In this work we consider tree topologies, with each edge of the tree consisting 
of two opposite directed fiber links. Raghavan and Upfal [33] considered trees 
with single undirected fibers carrying undirected paths. However, it has since 
becomes apparent that optical amplifiers placed on fiber will be directed devices. 
Thus, directed graphs are essential to model state of the art technology. 

In particular, we survey recent methods and algorithms for efficient use of 
bandwidth in trees considering arbitrary patterns of communication requests. 
All the results are given in terms of the load of the pattern of requests that have 
to be routed. Surveys on bandwidth allocation for more specific communication 
patterns like broadcasting, gossiping, and permutation routing can be found in 
[9,22]. 

The rest of this paper is structured as follows. In Section 2 we formalize the 
wavelength routing problem and present hardness results and lower bounds for 
the general case and the special case of symmetric communication. In Section 3 
we describe deterministic greedy algorithms and present the best known results 
on them. In Section 4 we relax the model and introduce devices called converters 
which can improve bandwidth allocation with some sacrifice in network cost. In 
Section 5 we briefly describe the first randomized algorithm for the problem. We 
conclude in Section 6 with a list of open problems. 

2 Hardness Results and Lower Bounds 

As we mentioned in the previous section, the load of the communication pattern 
is a lower bound on the number of necessary colors (wavelengths). The following 
question now arises. Given any communication pattern of load L, can we hope for 
a wavelength routing with no more than L wavelengths? The answer is negative 
for two reasons. The former is that this problem is NP-hard. The latter is that 
there are patterns that require more than L wavelengths. 
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We formalize the problem as follows. 

Wavelength Routing in Trees 

Instance; A directed tree T and a pattern of communication requests 
(i.e., a set of directed paths) P of load L. 

Question: Is it possible to assign wavelengths (colors) from {1, 2, L} 
to requests of P in such a way that requests that share the same directed 
link are assigned different wavelength? 

Intuitively, we may think of the wavelengths as colors and the wavelength 
routing problem as a coloring problem of directed paths. In the rest of the 
paper we use the terms wavelength (wavelength routing) and color (coloring), 
interchangeably. 

Erlebach and Jansen [14] have proved the following hardness result. 

Theorem 1 ([14]). Wavelength Routing in Trees is NP-complete. 

Note that the above statement is true even if we restrict instances to ar- 
bitrary trees and communication patterns of load 3. The following statement, 
which is due to Erlebach and Jansen [15] as well, applies to binary trees and 
communication patterns of arbitrary load. 

Theorem 2 ([15]). Wavelength Routing on Binary Trees is NP-com- 
plete. 

Thus, the corresponding optimization problems (minimizing the number of 
wavelengths) are NP-hard, in general. 

Now, we give the second reason why, given a communication pattern of load 
L, a wavelength routing with not much more than L wavelengths is infeasible. 

Theorem 3 ([25]). For any integer I > 0, there exists a communication pattern 
of load L = 41 on a binary tree T that requires at least 5L/4 wavelengths. 

Theorems 1 and 2 hold even in the special case of patterns of symmetric com- 
munication requests [11], i.e., for any transmitter-receiver pair of nodes (^ 1 ,^ 2 ) 
in the communication pattern, its symmetric pair (u 2 , ui) also belongs to the pat- 
tern. Furthermore, there exist patterns of symmetric communication requests of 
load L on binary trees which require arbitrarily close to 5L/4 wavelengths [11]. 
The construction of these patterns is much more complicated than that of [25] . In 
the same work [11], interesting inherent differencies between the general problem 
and the symmetric case are also shown. 

3 Greedy Algorithms 

All known wavelength routing algorithms [29,20,25,21,19,13] belong to a special 
class of algorithms, the class of greedy algorithms. We devote this section to 
their study. Given a tree network T and a pattern of requests P, we call greedy 
a wavelength routing algorithm that works as follows: 
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Starting from a node, the algorithm computes a breadth-first (BFS) 
numbering of the nodes of the tree. The algorithm proceeds in phases, 
one per each node u of the tree. The nodes are considered following their 
BFS numbering. The phase associated with node u assumes that we 
already have a proper coloring where all requests that touch (i.e. start, 
end, or go through) nodes with numbers strictly smaller than u’s have 
been colored and no other request has been colored. During this phase, 
the partial proper coloring is extended to one that assigns proper colors 
to requests that touch node u but have not been colored yet. During each 
phase, the algorithm does not recolor requests that have been colored in 
previous phases. 

Thus, various greedy algorithms differ among themselves in the strategies fol- 
lowed to extend the partial proper coloring during a phase. The algorithms in 
[29,20,25,21] make use of complicated subroutines in order to extend the partial 
coloring during a phase; in particular, their subroutines include a reduction of 
the problem to an edge coloring problem on a bipartite graph. On the other 
hand, the algorithms in [19,13] use much simpler methods to solve the wave- 
length routing problem on binary trees. The common characteristic for all these 
algorithms is that they are deterministic. 

The algorithms in [29,20,25,21] reduces the coloring of a phase associated 
with node u to an edge coloring problem on a bipartite graph. In the following 
we describe this reduction. 

Let Vo be m’s parent and let vi, - ■ ■ ,Vk be the children of u. The algorithm 
constructs the bipartite graph associated with u in the following way. For each 
node Vi, the bipartite graph has four vertices Wi,Xi,Yi, Zi and the left and right 
partitions are {Wi, Zi\i = 0, - ■ ■ k} and {Xi, = 0, • • • k}. For each request of 
the tree directed out of some Vi into some Vj, we have an edge in the bipartite 
graph from Wi to Xj . For each request directed out of some Vi and terminating 
on u, we have an edge from Wi to Yi. Finally, for each request directed out of 
u into some Vi, we have an edge from Zi to Xi. See Figure 1. The above edges 
are called real. Notice that all edges that are adjacent to either Xq or Wq have 
already been colored, as they correspond to requests touching a node with BFS 
number smaller than u’s (in this specific case, the requests touch u’s parent) 
that have been colored at some previous phase. We call the edges incident to 
either Xq or Wq color-forced edges. 

Notice that no real edge extends across opposite vertices Zi and Yi or Wi 
and Xi. Indeed vertex Zi has edges only to vertices of type Xp, on the other 
hand, an edge from Wi to Xi would correspond to a request in the tree going 
from a to itself. We call a pair of opposite vertices a line. Notice also that all 
vertices of the bipartite graph have degree at most L and, thus, it is possible to 
add fictitious edges to the bipartite graph so that all vertices have degree exactly 
L. The following claim holds. 

Claim ([29]). Let P be a pattern of communication requests on a tree T. Con- 
sider a specific BFS numbering of the nodes of T, a node u and a partial coloring 
X of the requests of P that touch nodes with BFS number smaller than u’s. Then, 
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Fig. 1. Requests touching node u and the relative bipartite graph (only real edges are 
shown) 



any coloring of the edges of the bipartite graph associated with u corresponds 
to a legal coloring of the requests of P that touch node u. 

Thus, the problem of coloring requests is reduced to the problem of coloring 
the edges of an L-regular bipartite graph, under the constraints that some colors 
have already been assigned to edges adjacent to Wq and Xq. We call this problem 
an a-constrained bipartite edge coloring problem on an L-regular bipartite graph. 
The parameter a denotes that edges incident to nodes Wq and Xq have been 
colored with aL colors. The objective is to extend the coloring to all the edges 
of the bipartite graph. 

The works [29,20,25,21] give efficient solutions to this problem. They either 
consider matchings in pairs and color them in sophisticated ways using detailed 
potential and averaging arguments for the analysis [29,25] or partition match- 
ings into groups which can be colored and accounted for indepedently [20,21]. 
In particular, Kaklamanis et al. [21] solve the problem proving the following 
theorem. 

Theorem 4 ([21]). For any a £ [1,4/3] and integer L > 0, there exists a 
polynomial time algorithm for the a-constrained bipartite edge coloring problem 
on an L-regular bipartite graph which uses at most (l -I- f ) L total colors and 
at most 4L/3 colors per line. 

The interested reader may refer to the papers [29,20,25,21] for detailed de- 
scription of the techniques. Note that one might think bipartite edge coloring 
problems with different constraints. Tight bounds on the number of colors for 
more generalized constrained bipartite edge coloring problems can be found in 
[ 12 ]. 

Using as a subroutine the coloring algorithm presented in [21] for a = 4/3, 
the wavelength routing algorithm at each phase maintains the following two 
invariants: 

I. Each phase uses a total number of colors no greater than 5L/3. 

II. The number of colors seen by two opposite directed links is at most 4L/3. 
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In this way the following statement can be proved. 

Theorem 5 ([21], see also [16]). There exists a polynomial time greedy algo- 
rithm which routes any pattern of communication requests of load L on a tree 
using at most 5L/3 wavelengths. 

In the same work [21] (see also [16]), it was proved that, in general, greedy 
deterministic algorithms cannot route patterns of communication requests of 
load L in trees using less than 5L/3 wavelengths. Thus, the greedy algorithm 
presented in [21,16] is best possible within the class of deterministic greedy 
algorithms. In Section 5 we demonstrate how randomization can be used to beat 
the barrier of 5/3 at least on binary trees. 

4 Wavelength Conversion 

In the previous section we saw that (deterministic) greedy wavelength routing 
algorithms in tree networks cannot use, in the worst case, less than 5L/3 wave- 
lengths to route sets of communication requests of load L, resulting to 60% 
utilization of available bandwidth. Furthermore, even if a better (non-greedy or 
randomized) algorithm is discovered, we know by Theorem 3 that there exist 
communication patterns of load L that require 5L/4 wavelengths, meaning that 
20% of the available bandwidth across the fiber links will remain unutilized. 

The inefficiency of greedy algorithms in the allocation of the bandwidth is due 
to the fact that greedy algorithms color requests going through node u without 
“knowing” which requests go through its children. 

Thus, if we are seeking better utilization of optical bandwidth we have to 
relax some of the constraints of the problem. In particular, wavelength converters 
allow to relax the restriction that a request has to use the same wavelength along 
the whole request from the transmitter to the receiver. 

A possibility would be to convert the optical signal into electronic form and 
to retransmit it at a different wavelength. If there is no restriction on the wave- 
lengths on which the message can be retransmitted, then it is possible to route 
all patterns of communication requests of load L with L wavelengths. In fact, 
in this case the assignment of wavelengths to requests on a link of the network 
is independent from the wavelengths assigned to the same requests on the other 
links. 

However, converting optical signals to electronic signals has the drawback of 
wasting the benefits of using optical communication. Recently, a new technology 
has been proposed that allows to change the wavelength of an optical signal 
without converting it into electronic form. Wavelength converters have been 
designed and constructed [43]. A wavelength converter, placed at a node of the 
network, can be used to change the wavelengths assigned to requests traversing 
that node. The effects of wavelength conversion have been extensively studied 
in different models and it has been proved that it can dramatically improve 
the efficiency in the allocation of the optical bandwidth [8,24,34,35,36,39,42]. 
However, wavelength conversion is a very expensive technology and it is not 
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realistic to assume to have in all the nodes of the network the capability of 
changing wavelengths to all requests. This motivates the study of all-optical 
networks that allow for some form of restricted wavelength conversion. 

In the literature, two prevalent approaches are used to address WDM optical 
networks with wavelength conversion: sparse conversion and limited conversion. 
In a network with sparse wavelength conversion, only a fraction of the nodes are 
equipped with wavelength converters that are able to perform an arbitrary num- 
ber of simultaneous conversions; in a network with limited conversion, instead, 
each node hosts wavelength converters, but these devices can perform a limited 
number of conversions. 

Sparse conversion optical networks have been considered in [35,39,41,23]. Wil- 
fong and Winkler [41] consider the problem of minimizing the number of nodes 
of a network that support wavelength conversion in order to route any com- 
munication pattern using a number of wavelengths equal to the optimal load. 
They prove that the problem is NP-hard. Kleinberg and Kumar [23] present a 
2-approximation to this problem for general networks, exploiting its relation to 
the problem of computing the minimum vertex cover of a graph. Ramaswami 
and Sasaki [35] show that, in a ring network, a simple converter is sufficient to 
guarantee that any pattern of requests of load L can be routed with L wave- 
lengths. Subramaniam et al. [39] give heuristics to allocate wavelengths, based 
on probabilistic models of communication traffic. 

Variants of the limited conversion model have been considered by Ramaswami 
and Sasaki [35], Yates et al. [42], and Lee and Li [26]. Ramaswami and Sasaki [35] 
propose ring and star networks with limited wavelength conversion to support 
communication patterns efficiently. Although they address the undirected case, 
all their results for rings translate to the directed case as well. Furthermore, 
they propose algorithms for bandwidth allocation in undirected stars, trees, and 
networks with arbitrary topologies for the case where the length of requests is 
at most two. 



4.1 The Network Model 

In our network model some nodes of the network host wavelength converters. A 
wavelength converter can be modeled as a bipartite graph G = (V, Y, E) . Each 
one of the sets of vertices X and Y have one vertex for each wavelength and there 
is an edge between vertices x G X and y G Y if and only if the converter is capable 
of converting the wavelength corresponding to x to the wavelength corresponding 
to y. For example, a full converter corresponds to a complete bipartite graph and 
a fixed conversion converter corresponds to a bipartite graph where the vertices 
of U have degree one. Some examples of wavelength converters are depicted in 
Figure 2. 

In order to increase network performance without tremendous increase in 
cost, we mainly use converters of limited functionality. The term “limited” re- 
flects the fact that the converters are simple according to two measures: their 
degree and their size. We proceed, now, to define these two measures. 
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Fig. 2. Wavelength conversion bipartite graphs: (A) fixed conversion, (B) partial con- 
version, (C) full conversion, and (D) no conversion 



Definition 6 The degree of a converter is the maximum degree of the vertices 
of its corresponding bipartite graph. 



Definition 7 The size of a converter is the number of edges in the corresponding 
bipartite graph. 

Let u be a node that hosts converters: each converter located at u is assigned 
to a pair of incoming and outgoing directed links adjacent to u. This converter 
will be used only to change colors assigned to requests containing the two directed 
links, while traversing u. Thus, a request may have one color on the incoming 
link and a different color on the outgoing link. In other words, the connection 
request corresponding to the request may travel on a wavelength on the segment 
ending at u and on a different wavelength on the segment starting from u. 

In the discussion that follows, we consider three models of limited conver- 
sion networks, differing in the number of converters placed at nodes and in the 
placement pattern. Denote by d the degree of u and by p the parent of u. The 
models we consider are the following: 

all-pairs There is one converter for each pair of incoming and outgoing links 
adjacent to u. Thus, the number of converters at u is d{d — 1) /2 and we can 
change color to all requests traversing u. 

top-down For each child v of u, there is a converter between links (p, u) 
and (m, v ) and another converter between links (u, u) and (u,p). The number 
of converters at u is 2{d— 1) and we can change color only to requests coming 
from or going to p. 

down For each child v of u, there is a converter between links (p, u) and 
(u,v) and a converter between links (w,u) and (u,v), for each child w of 
u different from v. The number of converters at u is {d — 1)^ and we can 
change color only to requests going from p to a descendant of u or to requests 
traversing two distinct children of u. 

In Figure 3 it is shown how converters are positioned at a node of a limited 
conversion binary tree. 

Clearly, a wavelength routing algorithm for down and top-down limited con- 
version networks also work for all-pairs limited conversion network. However, 
the converse does not necessarily hold. 
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Fig. 3. The position of wavelength converters at a node of a limited conversion binary 
tree network 



4.2 Recent Results 

Recently, in a series of papers [4,5,6,17] (see also [2]), significant progress has 
been made in bandwidth allocation in sparse and limited conversion trees. In 
the following, we briefly discuss these results. 

Auletta et al. [4] prove that converter in at least — \ \_ oi the n nodes 
of a tree may be required if we seek for optimal bandwidth utilization. They 
furthermore demonstrate how to locate full converters in at most J f ~ ^ L nodes 
so that optimal bandwidth utilization is always feasible. 

In the case of limited conversion, Auletta et al. [4] show how to construct 
converters of degree 2'/L — 1 which allow optimal bandwidth utilization in top- 
down binary trees, i.e., all sets of requests of load L can be routed with L 
wavelengths. In [6], using properties of Ramanujan graphs and dispersers (classes 
of graphs which have been explicitly constructed in [27,28,40]), they construct 
converters of almost linear size which also allow optimal bandwidth utilization 
in top-down binary trees. 

Auletta et al. [6] and independently Gargano [17] show how to achieve optimal 
bandwidth utilization in all-pairs conversion binary trees using converters of de- 
gree 15. The construction of these converters is based on properties of Ramanujan 
graphs. Furthermore, Gargano in [17] show how to achieve nearly-optimal band- 
width utilization in down-conversion arbitrary trees with low-degree converters. 
The construction of converters is based on properties of Ramanujan graphs as 
well. The interested reader may see [2] for formal proofs of these results. 
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5 Randomized Algorithms 

In an attempt to beat the 5/3 lower bound for deterministic greedy algorithms, 
Auletta et al. [3] define the class of randomized greedy wavelength routing algo- 
rithms. Randomized greedy algorithms have the same structure as deterministic 
ones, i.e., starting from a node, they consider the nodes of the tree in a BFS 
manner. Their main difference is that a randomized greedy algorithm A uses a 
palette of colors and at each phase associated with a node, A picks a random 
proper coloring of the uncolored requests using colors of the palette according 
to some probability distribution. 

The results presented in the following were originally obtained in [3]. The 
interested reader may see [3] for further details. 

As far as lower bounds are concerned, Auletta et al. [3] prove that with very 
high probability, routing requests of load L with less than 3L/2 colors in trees of 
depth f2(L) and routing requests of load L with less than 1.293L — o(L) colors in 
trees of constant depth is infeasible using greedy algorithms. These statements 
are proved by considering randomized adversaries for the construction of the 
pattern of communication requests. 

In the following we give the main ideas of a randomized wavelength routing 
algorithm presented in [3]. The algorithm has a greedy structure but allows for 
limited recoloring at the phases associated with each node. 

At each phase, the wavelength routing algorithm maintains the following two 
invariants: 

I. Each phase uses a total number of colors no greater than 7L/5. 

II. The number of colors seen by two opposite directed links is exactly 6L/5. 

At a phase associated with a node u, a coloring procedure is executed which 
extends the coloring of requests that touch u and its parent node to the requests 
that touch u and are still uncolored. The coloring procedure is randomized (se- 
lects the coloring of requests being uncolored according to a specific probability 
distribution). In this way, the algorithm can complete the coloring at the phase 
associated with node u using at most 7L/5 colors in total, keeping the number 
of colors seen by the opposite directed links between u and its children to 6Lj5. 

At each phase associated with a node u, the algorithm is enhanced by a 
recoloring procedure which recolors a small subset of requests in order to main- 
tain some specific properties on the (probability distribution of the) coloring of 
requests touching u and its parent. This procedure is randomized as well. 

The recoloring procedure at each phase of the algorithm works will very 
high probability. The coloring procedure at each phase always works correctly 
maintaining the two invariants. As a result, if the depth of the tree is not very 
large (no more than 0(L^/^)), the algorithm executes the phases associated with 
all nodes, with high probability. 

After the execution of all phases, the set of requests being recolored by the 
executions of the recoloring procedure are colored using the simple deterministic 
greedy algorithm with at most o{L) extra colors due to the fact that as far as 
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the depth of the tree is not very large (no more than the load of the 

set of requests being recolored is at most o{L), with high probability. 

In this way, the following theorem is proved. The interested reader may look 
at [3] for a detailed description of the algorithm and formal proofs. 

Theorem 8 ([3]). Let 0 < S < 1/3 be a constant. There exists a randomized 
wavelength routing algorithm that routes any pattern of communication requests 
of load L on a binary tree of depth at most /8 using at most 7L/5 + o{L) 
colors, with probability at least 1 — exp 

6 Open Problems 

Recent work on wavelength routing in trees has revealed many open problems; 
some of them are listed below. 

— The main open problem is to close the gap between 5L/4 and 5L/3 for the 
number of wavelengths sufficient for routing communication patterns of load 
L on arbitrary trees (see Theorems 3 and 5). Closing the gap between 5L/4 
and 7L/5 + o{L) for binary trees also reserves some attention. 

— Furthermore, although for deterministic greedy algorithms we know tight 
bounds on the number of wavelengths, this is not true for randomized greedy 
algorithms. Exploring the power of randomized greedy algorithms in more 
depth is interesting as well. 

— Many improvements can be made in the work on wavelength conversion. 
The main open problem here is whether we can achieve optimal bandwidth 
utilization in arbitrary trees using converters of constant degree (or even 
linear size). We conjecture that this may be feasible in all-pairs conversion 
trees with deterministic greedy algorithms. 
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Abstract. The last decade have seen a considerable increase in computer and 
network performance, mainly as a result of faster hardware and more 
sophisticated software. In fact, the need for realistic simulations of complex 
systems relevant to the modeling of several modem technologies and 
environmental phenomena increasingly stimulates the development of new 
advanced computing approaches. 

Today it is possible to couple a wide variety of resources including 
supercomputers, storage systems, data sources, and special classes of devices 
distributed geographically, and use them as a single unified resource, thus 
forming what is popularly known as a computational grid. The initial aim of 
Grid Computing activities was to link supercomputing sites (Metacomputing); 
current objectives go far beyond this. According to Larry Smarr, NCSA 
Director, a Grid is a seamless, integrated computational and collaborative 
environment. Many applications can benefit from the grid infrastructure, 
including collaborative engineering, data exploration, high throughput 
computing, and of course distributed supercomputing [1,2, 3]. 

Grid applications (multi-disciplinary applications) couple resources that 
cannot be replicated at a single site even or may be globally located for other 
practical reasons. These are some of the driving forces behind the inception of 
grids. In this light, grids let users solve larger or new problems by pooling 
together resources that could not be coupled easily before. 

Hence the Grid is not only a computing paradigm for just providing 
computational resources for grand-challenge applications. It is an infrastructure 
that can bond and unify globally remote and diverse resources ranging from 
meteorological sensors to data vaults, from parallel supercomputers to personal 
digital organizers. 

Currently, there are many grid projects underway worldwide. A very 
comprehensive listing can be found in [4] [5]. Moreover, two important 
international open forums. Grid ( www.gridforum.org I. and E-Grid 
( www.egrid.org I have been created in order to promote and develop Grid 
Computing technologies and applications. This talk aims to present the state- 
of-the-art of grid computing and attempts to survey the major international 
adventures in this area. 
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Abstract. The monitoring and control of devices and systems has at- 
tracted attention from the field of knowledge-based systems for a number 
of reasons. The problem is intrinsically knowledge-intensive, benefiting 
from awareness of the behaviour and interactions of components. There 
are complexities such as time constraints, partial and qualitative infor- 
mation, and in many cases there is a need for a degree of human under- 
standability in performance. In this paper the problems of control are 
characterised, and a selection of applicable knowledge- based techniques 
described. Some architectural issues for control system design are also 
discussed. Two case studies are described which illustrate the variety of 
problems and approaches: one for control of flooding in a city, the other 
for control of an anaerobic waste treatment plant. 



1 Introduction 

The word “control”, whether used in the context of information technology or 
more generally, implies the existence of a desired state of the world from which 
the actual state of the world may differ. It furthermore implies the possibility of 
taking actions which will result in adjustments to the state of the world, with 
a view to bringing the actual state closer into line with the desired. From this 
starting point a number of the problems of control systems are already visible 
on the horizon, and likewise the importance of knowledge in coping with them. 

Let us start thinking about control systems in terms of the underlying sys- 
tem that they control. Such a system implements a process, which may be more 
or less well understood, and the process is characterised by inputs and outputs. 
Furthermore, it operates in an environment which may influence its behaviour. 
Already there are complexities emerging in this view of the situation: for ex- 
ample, feedback which blurs the distinction between inputs and outputs, or dis- 
tinguishing between inputs to the process and important environmental factors 
which influence it. 

Examples of such systems include: 

— The currency of a nation, where a key output is the exchange rate with 
respect to other currencies, one of the inputs is the interest rate set by 
the national bank, and the environment includes the economic state and 
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prospects of the economy, the beliefs and expectations of investors, and a 
multitude of other factors. 

— The central heating of an office building, where the output is the desired 
temperature in the offices, the input is the operation of the boiler as deter- 
mined by the setting of the thermostat, and the environment includes the 
exterior temperature. 

— The flood defences of a city, where the outputs are the levels of water in 
rivers, retention basins, and (in the worst case) city streets, the inputs are 
the rainfall and the states of the flood defences, and the environment includes 
the level of the tide in the river. 

It is clear from considering these examples and many others that there is 
a distinction between outputs at the highest level, such as “amount of flood- 
ing”, and what is directly measurable and available to the controlling system. 
These measurables will include levels of water at particular points in the river 
or drainage system, and the state of functioning of apparatus. There is like- 
wise a distinction between the conceptual inputs, such as “rainfall”, and the 
direct controllables, including operation of pumps and sluice gates. Taking these 
ideas into consideration, we can now formulate our view of a control system 
diagramatically, as shown in Fig. 1. We side-step the problems of represent- 
ing inputs, outputs and environment by thinking in terms of controllables and 
measurables — those entities that may be directly set and directly evaluated by 
the controlling system. 




Fig. 1. A view of control 



A number of important points are worth mentioning before we proceed to 
consider the place of knowledge in control systems. 





Knowledge-Based Control Systems 



77 



— The entity that performs the control may be a human being, a computer 
system, or a combination of both. The diagram says nothing about where 
the responsibility lies for the actions taken on the controllables. Though it 
is possible that the system operates without intervention or supervision by 
a human, this is only one end of a spectrum of possibilities. In other cases, 
the “locus of responsibility” [21] might lie more with the human. 

— It is possible that the system being controlled might not be the physical sys- 
tem of ultimate interest, but rather a model of it in the information system. 
In this case, the control is taking place on the model, which is being used to 
improve control of the real system — the “management flight simulator” idea 

([17], [12]). 

— A question arises about the relationship between control and diagnosis. Diag- 
nosis too is concerned with differences between ideal states and actual states. 
But in diagnosis, the measurables are used to deduce the malfunctioning of 
the system, and controllables to vary behaviour so as to shed more light 
on the malfunction. Diagnosis is not the subject of this paper; nonetheless, 
some knowledge-based techniques such as model-based reasoning do have 
applications in control. 



2 Knowledge and the Problems of Control 

The representation we have shown of a general control system allows us to iden- 
tify some of the problems of control, and to see why knowledge-based techniques 
can help to alleviate them. 

First and most obvious is the problem of incomplete information. The mea- 
surables to which the controller has access may only give a very partial view 
of the situation, perhaps because of physical constraints on their number and 
location, perhaps because they measure only a subset of quantities of interest. 
In the case of anaerobic waste treatment plants, to which we shall return later, 
the simplest and most robust sensors measure elementary properties such as 
temperature and pH. Clearly there are many more properties of the plant which 
are valuable for monitoring its state. 

Related to this point, but at a deeper level, are the problems arising from 
a lack of understanding of how the controlled system functions. The controller 
has access to measurements and can make adjustments to certain controllable 
factors, but how does he, she or it know the consequences of those actions? How 
can it know that adjusting thermostats or flow rates will result in changes to 
behaviour in the desired direction? Now it should be said that in some sim- 
ple control situations there is no need for any understanding of the system’s 
functioning — see [15] for an explanation of simple linear feedback control and 
why this works in many cases even when the underlying process is quite complex 
and not understood. But for complex systems with many dependencies, knowl- 
edge of the functioning at some level can be exploited to improve control — see 
[5] for the use of models to improve diagnostic capabilities. 

To the problems of incomplete information and ill-understood processes we 
may add the problems of diverse information. In managing many macroscopic 
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systems such as waste water treatment plants, the operators’ visual impressions 
may contain highly valuable information. Not all of our “measurables” are nu- 
merical and read by a mechanical, chemical or electrical sensor. If the operator 
can see black sludge packets at the surface of the clarifier, this suggests that the 
sludge is staying too long in the clarifier. How then can observations of this kind 
be integrated with numerical measurements in the control process? 

A further problem in some domains arises from the timescales of the processes 
involved. A common analogy for this is “steering a super-tanker” . We imagine 
trying to direct an enormous oil tanker as it approaches the oil terminal. Changes 
to the rudder will take effect very slowly, and continue inexorably. The worst way 
to manage such a system is by continually making changes based on the perceived 
behaviour at any instant: this will lead to over-correction and instability. 

Another time-based problem is that of adaptation. It may be that the be- 
haviour and properties of the controlled system are changing over time, and that 
therefore the controller should adapt its behaviour accordingly. What worked at 
one point in time may be unsuitable later. Of course, one has to tread very care- 
fully here: the change in behaviour may be due to a malfunction in the system, 
and the controller will find itself attempting to correct for this when the best 
response would in fact be to flag a problem. However, there certainly are cases 
when processes change their characteristics in a normal way over time. 

Finally, we return to the issue highlighted in the previous section of the locus 
of responsibility for the control. If a human is to have meaningful responsibility, 
they must be provided with the information they need to evaluate decisions 
and judgements. A cooperating human-computer system needs to take this into 
account. It is no use to propose a single course of action for the human to accept 
or reject unless they are also provided with a justification of why that they can 
critique. There are also other models for such an interaction: the human might 
make a suggestion which the computer system then critiques [19]. 

Fig. 2 shows our representation of a control system with some of these issues 
identified. 



3 Knowledge-Based Techniques 

Before considering the architecture of a control system, we will examine some 
particular knowledge-based techniques that may be used to implement elements 
of the control system. These techniques are knowledge-based in the sense that 
they manipulate explicitly represented knowledge of the domain, the system to 
be controlled, even the process of control itself. There should be some component 
that can be identified as “where the knowledge resides” . For this reason, systems 
such as neural networks are excluded from the scope of this paper, since the 
representations they utilise are not explicit embodiments of knowledge [9]. 

For each of the techniques, their use in knowledge-based control will be ex- 
plained, and their advantages summarised. 
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3.1 Rule-Based Reasoning 



Systems based on production rules have a long and successful history within the 
knowledge-based systems community [10]. The principle is very familiar: rule 
sets containing rules of the form IF <condition> THEN <action> are applied 
to particular situations to draw inferences which may then cause further rules 
to fire eventually leading to conclusions. This is elementary forward chaining; 
backward chaining is when hypotheses are tested by evaluating the rules that 
conclude those hypotheses, testing their premises successively until the hypoth- 
esis is confirmed or eliminated. In the context of control systems, rule-based 
reasoning may be applied to making the connection between measurables and 
controllables, as shown schematically in Fig. 3. 
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Fig. 3. Rule-based reasoning in control 
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Rule-based reasoning is frequently employed as one stage of knowledge-based 
control systems. Such reasoning is often semi-heuristic, i.e. it expresses knowledge 
that is founded in a model of the world though without an explicit underlying 
model in the knowledge base. A typical example is found in [7], a rule for the 
detection of acidification in an anaerobic waste water treatment plant. 

IF pH evolution is low 
AND methcuie evolution is low 
AND there is no feeding overload 
AND there is no chatnge in feeding 

THEN acidification pre-alarm state and generate report of fact 

In this system the rules are implemented in the language OPS-83. 
Rule-based reasoning may also be valuable in other components of a control 
system. It may be used to implement a component for proposing actions to 
the user — what has been called an Option Generator [14]. One approach is to 
use heuristic rules obtained from human experts about the range of actions 
they consider in particular situations to construct a sequence of actions, which 
may then be simulated by a model to ascertain the consequences before being 
executed in reality. 

Two questions which must be asked when considering any choice of know- 
ledge-based technique are: where does the knowledge come from? And how may 
it be validated? The answer to the first can take three forms: the knowledge 
comes direct from human experts; from textbooks and operations manuals; or is 
induced from historical data, using techniques of rule induction or data mining. 
The first two have the advantage that explanations based on it will tend to make 
sense to human controllers, while the latter may give better coverage in practice 
but may also be fragile in new situations. 

Rule-based reasoning can be effective for addressing the following problems 
of control. 

— Lack of understanding of the controlled system may not be required if heuris- 
tic rules are used, derived either from human experts or past history. 

— Diverse information can be handled in the pre-conditions of rules — the above 
example rule illustrates this. 

— Justification for humans can be provided by rule trace, which gives some 
degree of explanation (but see [3] for more advanced approaches to explana- 
tion) . 

3.2 Case-Based Reasoning 

Case-based reasoning (CBR) is based on the principle that at least some human 
reasoning is based on recognition of similar cases to previous experience, with 
modification to new situations [16]. The basic CBR method entails the following 
steps: 

— Matching the new situation to the library of prior cases, identifying the 
closest matches. 
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— Retrieving the solutions corresponding to the matches. 

— Combining and modifying the solutions to produce a solution to the new 
case. 

— Storing the new case and its solution (when validated) in the case base. 

Issues in case-based reasoning include indexing, matching, retrieval, retaining 
and generalising cases. 

CBR can be applied to control problems in the recognition of similar situa- 
tions and hence reuse of “what worked before” (Fig. 4). 
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Fig. 4. Case-based reasoning in control 



The control problems addressed by CBR include: 

— Incomplete information, since a complete description is not necessarily needed 
to perform matching. 

— Lack of understanding of the controlled system, since the method is based 
on recognition of previous cases rather than deep reasoning. 

— Justification, which may be expressed with reference to prior cases. 



3.3 Model-Based Reasoning 

In this context a model is an explicit knowledge-based representation of some 
aspects of the controlled system, allowing reasoning to take place about its be- 
haviour under varying conditions. A model may be structural (representing the 
physical components of the system) or functional (representing their inputs and 
outputs). Typically a representation will specify physical or logical components 
with relationships between them. For example, a water supply network might be 
represented as a set of demand areas (with associated demand profiles), storage 
reservoirs (with certain capacities), pipes, valves, etc. The reasoning is based on 
simple supply and demand matching. The reasoning may in practice be imple- 
mented by rules - so the two approaches are not exclusive. A context for the use 
of models is given in [4]. 
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Applied to control problems, if it is possible to construct a model of the 
controlled system then it is possible to simulate behaviours and assess their 
consequences. This permits “what if’ reasoning for either human or control 
system (Fig. 5). 




Fig. 5. Model-based reasoning in control 



The problems addressed by model-based reasoning include: 

— Timescales, since a suitable model will be able to indicate the rates of evo- 
lution of behaviour. 

— Adaptation over time, since the model may be used to determine limits on 
correct functioning and measured values. 

— Justification, based on explanation of the interactions between components. 



3.4 Combining the Techniques 

Most real problems of control will need the application of a variety of techniques 
for greatest effectiveness. As we have seen, individual techniques address different 
aspects of the problems of control. The issue for the system designer is one 
of selecting and combining techniques to address the problems of a particular 
case. This combination may be done at two levels: the reasoning level and the 
architectural level. The architectural level will be the subject of the next section; 
here we take a case study showing how different reasoning techniques may be 
applied to a single problem. 

The TAP-EXTRA project [13] was a Trial Application under the European 
ESPRIT programme in which an emerging software technology, namely the en- 
hancement of information systems with cooperative, explanatory capabilities, 
was applied in a new area to show the benefits and allow the risks to be judged. 
The specific application, called Aleph, was for assisting in flood control in the 
city of Bordeaux, and the co-operative, explanatory capabilities were added using 
the tools and methods developed within the previous ESPRIT project I-SEE. 

The underlying problem is a control problem in which the locus of responsi- 
bility lies with the human operators. The RAMSES control centre in Bordeaux 
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is manned permanently and its staff constantly monitor incoming sensor read- 
ings (e.g. from rain gauges) and alarm signals (e.g. of a pump that has failed to 
start as expected). In times of heavy rainfall, their task is complicated by the 
likelihood of information overload. The fact that catastrophic flood events are 
rare makes matters more difficult for two reasons: equipment that is seldom used 
is more likely to malfunction, and the controllers’ own experience of such situ- 
ations is by definition limited. Furthermore, when a crisis threatens, an on-call 
engineer is called out who has the ultimate responsibility for the actions taken, 
but lacks the intimate familiarity with the flood control system. 

A basic alarm filtering and combining application called Aleph had been de- 
veloped and installed in the control centre, but was receiving only limited use. 
The reason was that the messages presented on the screen, although a synthe- 
sized subset of the basic incoming alarm signals, did not provide an adequate 
basis for decision making: they lacked context. Hence the basic control problem 
was one of incomplete knowledge of the situation and the underlying processes. 
The human controllers had many years of experience working in RAMSES, yet 
in situations of crisis they needed more support from the information systems. 
Three specific user needs were identified: 

— Risk assessment, particular in winter when the long duration of rains means 
that it is a difficult decision to determine when action must be taken. 

— Identification of “mental model mis-matches” , that is, situations which de- 
viate from the normal and might go unnoticed, for example, a pump which 
continues to run when there is no longer any need for it. 

— A synoptic overview of the situation, particularly for the on-call engineer. 

The approach taken to satisfy these needs was a combination of rule-based 
with model-based reasoning, using the model to amplify and interpret the rule- 
based reasoning. For example, the failure of a pump to start is a single alarm 
message. If this message is repeated over a short period of time, then the pump 
can be considered to have failed and should be reported to the user — the job of 
Aleph. Then model-based reasoning about the layout and interconnections of the 
fiood control network of pipes, pumps and retention basins is used to produce a 
risk assessment. The risk assessment is based on the following structure: 

— The suggestive risk factor (“stimulus”) 

— The risk “content”, i.e. what is there a risk of? 

— Contributing factors 

— Any potentially mitigating circumstances which might exist 

— An overall assessment of the risk and the effect of the other factors. The 
characterization of the risk might include a timescale if this is appropriate. 

Heuristic rules are used to detect the suggestive risk factors and the likely risk 
content, then the static model of the fiood control network is queried to deter- 
mine other relevant factors, for example, whether the water level in connecting 
basins is already high. Such factors allow for an informed and context-sensitive 
presentation of the risk to the user, giving much more helpful information for 
decision making than a simple statement that a certain pump has failed. 
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Fig. 6. The TAP-EXTRA system architecture 



The overall architecture of the TAP-EXTRA system is shown in Fig. 6. 
From the point of view of control systems, what we have here is an environ- 
ment in which the crucial requirement is the effective presentation of relevant 
and timely information to the human controller to enable superior decision mak- 
ing. The information system does not in itself cause actions to be taken, or even 
propose actions to the user; nevertheless the combined human-computer system 
is acting as a control system to prevent flooding. 

4 Architectural Issues 

As well as the knowledge integration perspective, there is also a system design 
perspective: how the control system is put together from software components, 
possibly distributed. The motivation for developing a control system in such a 
way may arise if there are to be multiple instances of the controlled system, which 
are generically the same but vary in individual detail. For example, different 
versions of a waste treatment plant, installed in different locations and possessing 
different configurations and operating conditions. 

In such cases, it can make sense to have repositories of knowledge physi- 
cally distinct: human controllers and experts in different places, with a general 
knowledge base at central location and local knowledge bases associated with 
the individual installations. 

4.1 Internet-Based Systems 

It is clear that the Internet offers opportunities for developing such distributed 
knowledge-based control systems. The prospects for expert systems on the In- 
ternet have been identifled and studied [8] : the simplest model is the Web-based 
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system in which the knowledge base is located and reasoning is performed on the 
central server, and the users access the system through Web browsers. Such an 
approach has the advantages of making the system highly accessible, using famil- 
iar interfaces, and being portable. However there are other more sophisticated 
models possible, including intelligent agents and cooperating expert systems. 

Alternatively, a distinction may be drawn between “knowledge server” ap- 
proaches (as just outlined) and “information client” [2]. The latter differs from 
the former in that the client is more active in seeking the information that it 
needs to solve the problem at hand, drawing on distributed knowledge sources 
according to what they can provide. 

The FirstAID system for diagnosis of scanning electron microscopes [2] is an 
example of such a system. The motivation for the approach is that there are few 
experts, they are geographically distant, and the visual aspects are important 
for diagnosis. The basic protocols used are HTTP and CUSeeMe, the latter for 
for transmitting dynamic images. The diagnostic system uses rule-based reason- 
ing with extensions, such as the ability to encode actions to be taken on the 
instrument as part of the rule consequence. 

For Internet-based control systems of any type, a new consideration enters: 
that of information security. This means not just protecting the information from 
unauthorised parties, but also guarantees of integrity and provision for network 
failures (e.g. in the middle of a sequence of actions being transmitted). 

4.2 Agent Architectures 

Intelligent agents [11] have attracted a great deal of attention recently in many 
areas of application. Characteristics of agents include autonomy, independence, 
specialisation and cooperative negotiation. Each agent is responsible for its own 
area of problem solving, applying specialist knowledge. An example from the 
control of waste water treatment plants (WWTPs) is given in [1]. Here agents 
embody knowledge of every sub-process within the WWTP, for example the 
settler, the pumping system, COD removal. The agents are responsible for val- 
idation of data, detection of abnormal conditions, prediction of future values, 
etc. There is a supervisor agent responsible for controlling the behaviour of the 
domain agents and activating their communication. Communication is by means 
of a “minimal language”, in fact tables of values (numerical and symbolic). 

It is also worth mentioning blackboard architectures: once a popular archi- 
tecture for knowledge-based systems [6] , now less fashionable but similar in some 
respects to the agent-based approach. Diverse knowledge sources cooperate to 
build up a solution to a problem using a central, globally accessible “blackboard” 
as a common working area. Such an approach is promising for control applica- 
tions because of the integration of different kinds of knowledge and data. An 
example application is given in [18]. 
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4.3 The TELEMAC Project 

The TELEMAC project is a project under the European Commission’s Infor- 
mation Society Technologies (1ST) programme, commencing in September 2001 
[20] . It is concerned with the management of anaerobic digestion in waste treat- 
ment, specifically the wastes resulting from wine and spirit production processes. 
This is an unstable biological process. The TELEMAC project aims to develop 
a set of adaptive and customisable tools for small units in order to improve 
the quality of their depollution process, reduce the treatment cost and increase 
the output of derivative products. It emphasizes the synergy expected from the 
merger of robust advanced control algorithms and knowledge-based supervision 
systems. 

The process to be controlled, anaerobic digestion, exemplifies most of the 
problems of control identified in Section 2, and suffers from a lack of tools to 
take benefit of its full potential. The process is based on a complex ecosystem 
of anaerobic bacterial species that degrade the organic matter. It presents very 
interesting advantages compared to the traditional aerobic treatment: it has a 
high capacity to degrade difficult substrates at high concentrations, produces 
very little sludge, requires little energy and in some cases it can even recover 
energy using methane combustion (cogeneration). 

But in spite of these advantages, industrial companies are reluctant to use 
anaerobic treatment plants, probably because of the drawback of its efficiency: 
it can become unstable under some circumstances (like variations of the process 
operating conditions). A disturbance can lead to a destabilisation of the pro- 
cess due to accumulation of intermediate toxic compounds resulting in biomass 
elimination and several months are necessary for the reactor to recover. During 
this period, no treatment can be performed by the unit. It is therefore a great 
challenge for computer and control sciences to make this process more reliable 
and usable at industrial scale. 

The technical objective of TELEMAC is to provide a set of tools to improve 
the process reliability and quality of managing a wastewater treatment plant 
with a remote centre of expertise using Internet resources. The advanced con- 
trol system must ensure the optimal working of the process and the supervision 
system must set an alarm in case of failure. If the problem is simple enough, the 
supervision system must trigger dedicated automatic control algorithms that 
will help the system to recover. If not tackled automatically, the system must 
decide which human intervention is required, local technician (e.g. for pump fail- 
ure, leak, ...) or remote expert. All these incidents and the measures taken must 
increment the database to feed back to the supervision system. The TELEMAC 
solution must therefore provide an integrated procedure that will increase the 
reliability of the process in order to guarantee that the plants respect environ- 
mental regulations. 

From the point of view of knowledge-based control, there are a number of 
innovations in TELEMAC. 

— The reliability of low information content sensors (the “measurables” ) is en- 
hanced by linking together operational low information level sensors through 
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analytical models. This will allow testing of their global coherency and deriv- 
ing highly informative estimates of some process variables through software 
sensors (observers). This sensor network will be perfected using the concept 
of aspecific sensors that are based on a robust methodology. 

— The inclusion of fault detection and diagnosis techniques within the data 
analysis system, instead of a supervision system, will enhance the quality 
of the smart sensors’ outputs. The fact that the measured value is matched 
with a characterisation of its uncertainty is another innovative feature of the 
proposed smart sensor. The supervision system will then decide more easily 
how to optimise the use of the received data. If they are of high quality, very 
accurate decisions can be made, whereas in the other case, more prudent 
decisions will be adopted. 

— In order to build an efficient control and monitoring procedure, particularly 
in case of failure, models will be developed especially for abnormal work- 
ing conditions. The first step will be to identify the most frequent failures 
(pump failures, inhibition of the biomass), then to reproduce them during 
lab experiments and finally to develop models for these situations. 

— The fault detection and isolation (FDI) module must be able to detect if the 
process is working in a faulty environment, and determine the problem’s ori- 
gin. If a failure is detected, the model corresponding to the symptoms of the 
process will be chosen from the model base developed specially for faulty sit- 
uations. Then, the software sensors and the control algorithms based on the 
selected model will be activated. The supervision system must not only test 
the integrity of the process, it also has to verify that the selected algorithms 
(controllers, software sensors, fault detection, ...) do their job properly. For 
that, it must be able to check the coherence of the algorithms’ outputs with 
their theoretical properties (convergence rate, dynamical behaviour, ...). If 
they turn out to be inefficient, an alarm will be triggered. 

— Significant information will be extracted from events occurring on the plant. 
Data from the sensor network, faults, controller outputs, simulations, expert 
consultations are not regarded as single and isolated events. They are com- 
bined and joined together by the supervision system to enrich the knowledge 
base. 

Fig. 7 shows the TELEMAC approach in comparison with current approaches. 
TELEMAC is a contribution to advancing the state of the art of knowledge- 
based control systems, most particularly in its use of a distributed, Internet- 
based architecture to optimise the distribution of monitoring and problem-solving 
capabilities between human and computer and between local and remote sites. 
It confronts many of the problems of knowledge-based control identified in Sec- 
tion 2. 

5 Conclusion 

Knowledge-based control is a broad field which encompasses a wide range of 
application domains and problem characteristics. The challenges that arise are 
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Fig. 7. TELEMAC compared with current approaches 



partly due to the complexity of the control problem itself, partly due to the 
inherent difficulties in codifying and deploying knowledge in information sys- 
tems. Various knowledge-based techniques are valuable and have been applied, 
and the current trend is towards integrating different techniques and distribut- 
ing problem solving to take advantage of generalisation and customisation in 
knowledge. 
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Abstract. Modern networked computing systems follow scenarios that 
differ from those modeled by classical Turing machines. For example, 
their architecture and functionality may change over time as components 
enter or disappear. Also, as a rule their components interact with each 
other and with the environment at unpredictable times and in unpre- 
dictable manners, and they evolve in ways that are not pre-programmed. 
Finally, although the life span of the individual components may be h- 
nite, the life span of the systems as a whole is practically unlimited. The 
examples range from families of cognitive automata to (models of) the 
Internet and to communities of intelligent communicating agents. 

We present several models for describing the computational behaviour of 
evolving interactive systems, in order to characterize their computational 
power and efficiency. The analysis leads to new models of computation, 
including ‘interactive’ Turing machines (ITM’s) with advice and new, 
natural characterizations of non-uniform complexity classes. We will ar- 
gue that ITM’s with advice can serve as an adequate reference model for 
capturing the essence of computations by evolving interactive systems, 
showing that ‘in theory’ the latter are provably more powerful than clas- 
sical systems. 



1 Introduction 

In the twentieth century, computability theory explored the limits of what can be 
digitally computed. The prominent claim, known as the Church- Turing thesis, 
asserts that every algorithm can be captured in terms of a standard Turing 
machine. The classical computing scenario consists of a fixed program, a finite 
input supplied in either off-line or online mode, and a meaningful result only if 
the computation halts in finite time. No changes to the program of the machine 
or its ‘architecture’ are allowed in the meantime, intermediate results cannot 
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influence the input and no information is carried over to future runs of the 
machine. This even applies in the case of w-Turing machines. 

Does this correspond to the way modern networked computers operate? 
Clearly it does not. Today’s systems operate practically uninterruptedly since 
the moment of their installation. They obtain their inputs via many different 
channels at unpredictable times, deliver corresponding responses continuously 
at times when they are ready, accumulate information over the course of their 
entire existence and use it across the boundaries of separate ‘runs’. Also, parts 
of their underlying hardware and software are updated, whenever an ‘external 
agent’ that operates some component decides to do so, without loss of vital data. 

One can object, as many people do when confronted with this observation, 
that this use of computers is simply different from the way assumed in the 
Church- Turing thesis, and that this change is insignificant and can be easily 
accommodated by adjusting the original model. While the latter is true, the for- 
mer is questionable: is it really an insignificant change? Answering this question 
will be a main concern in this paper. It will appear that at least in theory, the 
traditional notion of algorithmic computation must be extended for it. 

Compared to the classical computing scenario, the essence of the changes in 
modern computing technology we have in mind can be subsumed under three 
complementary issues: interactivity , non-uniform evolution (and adaptivity), and 
infinity of operation. Interactivity is often also called ‘reactivity’ [3]. The systems 
performing according to these three qualities together constitute what is meant 
by evolving interactive computing. 

The most prominent example is the Internet. It can be seen as a wide-area 
computing infrastructure, a kind of global computer ([4]). As a ‘computer’, 
the Internet is hindered by diverse administrative, architectural, and physical 
constraints. Worse even, it is ‘undesigned’, evolving unpredictably, with unpre- 
dictable computing characteristics. Many people, especially in the software en- 
gineering community (cf. [4], [16]), noticed that we are facing a new computing 
phenomenon that does not fit the classical Turing machine paradigm. For in- 
stance, Cardelli [4] writes: 

‘In order to program a global computer we first need to understand its 
model of computation. For example, does computation on the Web cor- 
respond naturally to a traditional model? There are indications that it 
does not. [For example] when browsing, we actively observe the reliabil- 
ity and bandwidth of certain connections (including zero or time-varying 
bandwidth), and we take action on these dynamic quality-of-service ob- 
servables. These observables are not part of traditional models of com- 
putation, and are not handled by traditional languages. What models of 
computation and programming constructs can we develop to automate 
behavior based on such observables?’ 

In this paper we will concentrate only on the first part of Cardelli’s question, 
calling for models of computation that capture the essence of global computing. 
It will lead us to the concepts of non-uniform computation and to a new approach 
to several non-uniform complexity classes. 
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Wegner [16,17] went even further, by claiming that interactivity alone can 
lead to computations that are more powerful than computations by Turing ma- 
chines. In other words, ‘interactive computing’ would violate the Church-Turing 
thesis. We will argue that interactivity alone is not sufficient to break the Tur- 
ing barrier. Interactivity merely extends the objects that one computes on: from 
finite strings to infinite ones, and the feedback mechanism remains computable. 
The computing power of a system could go beyond that of classical computing if 
non-uniformity is considered. Non-uniformity enters when one allows ‘evolving’ 
changes of the underlying hardware and/or software in the course of uninter- 
rupted computing. This is a standard case with the Internet. 

Surprisingly, evolving interactive computing seems to pervade not only the 
current, highly networked information processing systems but also - and mainly 
so - the information processing in (societies of) living organisms. It is a ‘tech- 
nology’ that has been invented by nature long ago that has very much the same 
characteristics as we described. Note that evolution in our setting is fundamen- 
tally different from the notion of learning that is often mentioned in connec- 
tion with interactivity. Learning is usually understood as a software evolution, 
whereas we will also and especially consider hardware evolution. 

The structure of the paper is as follows. In Section 2 we introduce an elemen- 
tary, and therefore fundamental, tool for dealing with interactive systems: the 
interactive finite automaton. In Section 3 we introduce sequences of interactive 
finite automata that share global states, leading us to the model of evolving in- 
teractive systems that we have in mind. Next, in Section 4 we describe the basic 
interactive Turing machine (ITM) that will serve as a platform for the design of 
its non-uniform variants. In Section 5 we define the ITM with advice, in Section 
6 the so-called site machine that models a site in a computer network, and finally 
in Section 7 we present the web Turing machine - a model of the Internet. Then, 
in Section 8 we prove the computational equivalence of the non-uniform models. 
In Section 9 we investigate the efficiency of web Turing machine computations 
in more detail, and show that these machines belong to the most efficient com- 
putational devices known in complexity theory. Finally, in Section 10 we will 
discuss some interesting issues related to our results. 

Most of the results mentioned here can be found in more detail in the original 
papers [12,14,13,15,20]. The present paper primarily outlines the overall research 
framework. The notion of sequences of interactive finite automata with global 
states and the respective results are new. 

2 Interactive Finite Automata 

Under the classical scenario, finite automata are used for recognizing finite 
strings. Under the interactive scenario, we consider interactive finite automata 
(IFA) which are a generalization of Mealy automata. They process potentially in- 
finite strings (called streams) of input symbols and produce a potentially infinite 
stream of output symbols, symbol after symbol. To stress the interactiveness, we 
assume that there is no input tape: the automaton reads the input stream via 
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a single input port. Likewise, it produces the output via a single output port. 
There is no way to return to symbols once read except when they’re stored in- 
ternally. We assume throughout that the input and output symbols are taken 
from the alphabet S = {0, 1, A}. Symbol A at a port means that ‘presently, there 
is neither 0 nor 1 appearing at this port’. The steps of an IFA follow a finite. 
Mealy-type transition function. 

Any IFA realizes a translation (j) that transforms infinite input streams over 
S into similar output streams. The A’s are not suppressed in the translation. 
Clearly, instead of an IFA one could consider any other device that is capable 
of entering into only a finite number of different configurations, such as discrete 
neural (cf. [8]) or neuroidal (cf. [10]) nets, neuromata [9], combinatorial circuits 
(cf. [2]), and so on. From [2,19] the next theorem follows: 

Theorem 1. For translations 4> : — >■ A7“ the following are equivalent: 

(a) (p is realized by a interactive finite (Mealy) automaton. 

(b) (p is realized by a neuroid. 

(c) p is realized by a discrete neural net. 

(d) p is realized by a discrete neuroidal net. 

(e) p is realized by a combinatorial circuit. 

In the theorem, all respective devices are assumed to work in an interactive mode 
in processing infinite input streams. Devices such as neural nets or combinatorial 
circuits that read their input in parallel, process the input stream in blocks that 
correspond to the number of their input ports. 

IFA’s embody two features of evolving interactive computing: interactivity, 
and infinity of operation. The interactivity enables one to describe (albeit a 
posteriori) the interaction between the machine and its environment: inputs 
succeeding to some outputs may be reactions to these outputs. Note that we 
did not yet impose the third desideratum of evolving interactive computing: the 
evolvability of the underlying computing mechanism. Of course, due to their 
simplicity, IFA’s do not have universal computing power either. Note that the 
motivation and computational scenario of IFA’s differ from those usually con- 
sidered for w-automata (cf. [12] or [13]). 

3 Sequences of Interactive Finite Automata 
with Global States 

In order to achieve universal computing power and support the evolvability prop- 
erty, we consider sequences of IFA’s. This enables us to realize more complicated 
translations and will also reveal the dependence of computational efficiency on 
the size of the underlying devices. The approach is inspired by the similar practice 
in non-uniform complexity theory where e.g. sequences (or families) of combina- 
torial circuits are considered (cf. [2,6]). Our approach differs not only in the use 
of different ‘building blocks’ - namely IFA’s instead of combinatorial circuits. 
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but also in the use of a communication mechanism between neighboring mem- 
bers in a sequence. Both changes are motivated by the need to accommodate all 
ingredients of evolving interactive computing. 

Definition 1. Let A = {Ai,A 2 , . . .} be a sequence of IFA’s over S, and let Qi 
be the set of states of Ai. Let G = {Gi, G 2 , . . .} be a sequence of nonempty finite 
sets such that Gi C Qi and Gi C Gi+\. Then A with G is called a sequence of 
IFA’s with global states. 

For a sequence A, there need not exist an algorithmic way to compute the 
description of the Ai, given i. Thus, the only way to describe the sequence may 
be to enumerate all its members. The set IJ^ Gi is called the set of global states. 
From now on we always assume sequences of IFA’s to have global states. 

On an infinite input stream over E, a sequence A computes as follows. At the 
start, A\ is the active automaton. It reads input and produces output for a while, 
until it passes control to A^ ■ In general, if Ai is the current active automaton, it 
performs its computation using the local states from the set Qi — Gj yf 0. If an 
input symbol causes Ai to enter a global state g £ Gi, then Ai stops processing 
and passes control to Ai+i. The input stream is redirected to the input port of 
Ai+i, Ai+i enters state g € Gi+i and continues processing the input stream as 
the new active automaton, starting with the next input symbol. 

Thus, in effect the input stream is processed by automata with increasing 
index. This models the property of evolution. The ‘transfer’ of control to the 
next automaton is invoked by the automaton currently processing the input. The 
next automaton continues from the same state in which the previous automaton 
stopped. This mechanism enables the transfer of information from the previous 
stage. In a sequence of IFA’s with global states the next automaton can be seen 
as a ‘next generation’ machine. Note that in finite time only a finite part of a 
sequence of IFA’s can have become active. 

Alternatively, instead of a sequence of automata, one may consider a single 
automaton that ‘evolves’ so at any time it acts as Ai iff Ai £ A is the currently 
active automaton. That is, the transition function of the automaton at hand is 
the same as that of Ai as long as Ai is active. Of course, the condition concern- 
ing the global states must still be maintained. The resulting automaton may 
appropriately be called an evolving interactive finite automaton. 

A sequence of IFA’s is called polynomially bounded iff there is a polynomial p 
such that for every z > 1, the size of Ai is at most p{i). The classes of translations 
realized by sequences of IFA’s with global states and polynomially and exponen- 
tially bounded size will be denoted as IFA-POLY and IFA-EXP, respectively. We 
will also consider the classes NA-LOG (the translations realized by sequences 
of neuromata [8] of logarithmic size), NN-POLY (the translations realized by 
sequences of standard recurrent, or cyclic, discrete neural nets of polynomial 
size reading their inputs in parallel), and CC-POLY (the translations realized 
by sequences of combinatorial circuits with a polynomial number of gates) . 
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4 Interactive Turing Machines 

The next tool we consider are interactive Turing machines (ITM’s) . ITM’s differ 
from standard TM’s in one important aspect: they allow for a infinite, never 
ending exchange of data with their environment. 

An ITM reads input from its input port and produces output at its output 
port. We assume that in each step the machine reads a symbol from its input port 
and writes a symbol to its output port. As before, the input and output symbols 
are taken from the alphabet E = {0,1, A}. We will normally require that an 
ITM reacts to any non-empty input by producing a non-empty output symbol 
at its output port after at most some finite time (the interactiveness or finite 
delay condition). The finite-delay condition ensures that if an input stream has 
an infinite number of non-empty symbols, then there must be an infinite number 
of non-empty symbols in the output stream. 

Definition 2. A mapping (f> : E‘^ — >■ E‘^ is called the interactive translation 
computed by an ITM X iff for all x and y, <(>(x) = y if and only ifX produces y 
on input x. 

As input streams have infinite length, complexity measures for ITM’s cannot 
be defined in terms of the total amount of resources used for processing the 
entire input, as the resulting values will generally be infinite as well. Therefore 
we measure the pace of growth of the resource utilizations, as a function of the 
length of the input stream processed so far. 

Definition 3. We say that for a given input stream the ITM I is of space 
complexity Sft) iff for any t > 0, after processing t input symbols from the given 
stream no more than S{f) cells on I’s (internal) tapes were ever needed. If this 
condition holds for every input stream, then we say that the ITM I is of space 
complexity S(t). 

The definition of time complexity is more involved. This is so because, as long 
as empty symbols are counted as legal symbols in input and output streams, any 
initial segment of a computation by an ITM is of linear time complexity w.r.t. 
the input read thus far. Yet it is intuitively clear that computing some non-empty 
output symbols can take more than one step and that in the meantime empty, 
or other ‘ready-made’ symbols must have been produced. We will measure the 
complexity of producing non-empty output symbols from a given input stream, 
at concrete times, by the reaction time. For t < j, we say that the j-th output 
depends on the input prefix of length t if and only if any change of the (t + l)-st 
and later input symbols cause no change of the output up to and including the 
j-th symbol. The value j is a (lower)bound to the reaction time for the prefix. 

Definition 4. We say that for a given input stream the ITM X is of reaction 
time complexity T(t) iff the reaction time of X to the input prefix of length t 
is (upper- )bounded by T(t), for any t > 0. If this condition holds for any input 
stream, then we say that the ITM X is of reaction time complexity T{t). 
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ITM’s alone do not lead to a non-recursive computational power. Allowing 
ITM’s to process infinite input streams only extends the operational scope, com- 
pared to classical TM’s which process merely finite streams. For further details 
see [12,13]. 

5 Interactive Turing Machines with Advice 

Next we introduce interactive Turing machines with advice (ITM/A’s). An ITM/A 
is an ITM as described above, enhanced by an advice (cf. [6,2]). Advice functions 
allow the insertion of external information into the course of a computation, in 
this way leading to a non-uniform operation. 

Definition 5. An advice function is a function f : Z+ — >■ E*. An advice is 
called S{n)-hounded if for all n, the length of f(ri) is hounded by S{n). 

A standard TM with advice and input of size n, is allowed to call for the value 
of its advice function only for this particular n. An ITM/A can call its advice at 
time t only for values ti < t. To realize such a call an ITM/A is equipped with 
a separate advice tape and a distinguished advice state. By writing the value of 
the argument t\ on the advice tape and by entering into the advice state at time 
t>t\ the value of /(h) will appear on the advice tape in a single step. By this 
action, the original contents of the advice tape is completely overwritten. 

We will be interested in advice functions whose values are bounded in length 
by known (computable) functions of t, especially in polynomially or logarith- 
mically bounded functions. Note that the mechanism of advice is very powerful 
and can provide an ITM/A with highly non-recursive ‘assistance’. 

The complexity measures for ITM/A’s are defined as for ITM’s without ad- 
vice (see Definitions 3 and 4). The length of the rewritten part of the advice 
tape is counted in the space complexity of the respective machine, not including 
the actual read-only advice value. 

Definition 6. The class ITM — C/T consists of the translations 4> computed by 
ITM — C machines using an advice function from T . 

Common choices for ITM — C that we shall use are: ITM — LOGS PACE 
(deterministic logarithmic space), ITM — PTIME (deterministic polynomial 
time), and ITM — PSP ACE (polynomial space). Common choices for T are 
log (logarithmically bounded advice functions) and poly (polynomially bounded 
advice functions). 

For completeness we show that ITM’s with advice are indeed more powerful 
than ITM’s without advice. (The result also follows from a countability argu- 
ment.) Care must be taken that the finite-delay condition is correctly observed. 

Consider the translation k defined as follows. As a special input, we first 
consider the string consisting of the infinite enumeration of all Turing machine 
descriptions, in blocks of non-decreasing size. For this input, the translation 
should assign to each machine description a ‘1’ if and only if the machine at 
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hand accepts its own description, and ‘0’ otherwise. If the input is not of this 
form and starts to differ at some point from the blocked form described, then 
K is assumed to work as described up until the last complete encoding in the 
sequence, and then copy every (empty or nonempty) symbol that follows after 
this. 

Lemma 1. Translation n can he realized by an ITM/A, but there is no ITM 
(without advice) that can realize k. 

Proof (sketch): Define the function / that to each n assigns the description of 
the ‘busy beaver machine’ of length n. The busy beaver machine is a Turing 
machine which, among all machines with encodings of length n, performs the 
maximum number of steps before halting, on an input that is equal to its own 
description. If no machine description of size n exists, then /(n) is assigned the 
empty string. 

Now design an ITM A using advice / as follows. A checks every time whether 
the input stream contains a ‘next’ block as expected, i.e. a next machine descrip- 
tion. If the next input segment is not a block as expected, A will know within 
finite time that this is the case (because the blocks must come ordered by size). 
If the next input segment is not a valid encoding, A copies the segment to output 
and then copies every (empty or nonempty) symbol that follows after this. This 
is consistent with k and satisfies the finite-delay condition. 

If the input stream presents A with a next block that is the valid description 
ru of a Turing machine M, A works as follows. Let |w| = n. A calls its advice for 
value n and gets the description {B) of length n of the respective busy beaver 
machine B. Now A alternately simulates one step of M on input w, and one step 
of B on input {B). Under this arrangement, one of the two simulations must halt 
as the first one. If it is the simulation of M that halts then A ‘accepts’ w, i.e. 
A outputs 1. Otherwise, A outputs 0. Thus, A realizes k and, as it satisfies the 
finite-delay condition in all cases, it is an ITM. 

The second part of the lemma is proved by using a modification of the stan- 
dard diagonal argument. For details, see [15]. □ 

The lemma serves as a means for proving the super- Turing computing power 
of machines that are computationally equivalent to ITM/A’s. As a first result of 
this kind we prove that sequences of IFA’s with global states are equivalent to 
ITM/A’s, implying that the former also posses super-Turing computing power. 

Theorem 2. For translations (p : — >■ , the following are equivalent: 

(a) (j) is computed by a polynomially hounded sequence A of IFA’s with global 
states. 

(b) (p is computed by a logarithmically space-hounded ITM/A AA with polynomi- 
ally hounded advice. 

Proof (sketch): (a) —>■ (b). Let (p be computed by A. Simulate the action of A step 
by step using an ITM Ai, with the following advice function. It will be invoked 
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each time when an automaton Ai in A currently reading the input enters a global 
state g € G^. At this time the advice function returns the description of 
and A4 proceeds with the simulation of A by simulating starting from state 
g € Gi+i- To simulate logarithmic space is enough for M since all it has 

to store is a pointer to the advice tape remembering the current state of 

(6) — >■ (a). For the reverse simulation, split the input string into blocks of 
size 1,2,3, . . . ,i, ■ ■ ■ and consider first the case when M does not call its advice. 
Sequence A will be designed so the f-th automaton in the sequence can ‘continue’ 
the processing of the t-th block, making use of its local states. Each automaton Ai 
is designed as a simulator of M on input prefixes of lengths G = 1+2+3+. . .+i = 
On these prefixes M needs O(logi) space and therefore can enter 0{p{i)) 
different states, for some polynomial p. This design warrants not only that Ai 
has enough states to represent each configuration of M. on an input prefix of 
length 0(G), but it also enables the ‘information transfer’, via the corresponding 
global states, from Ai to A^+i prior to the time when the ‘storage capacity’ of 
Ai gets exhausted. Hence, A can remain polynomially bounded for this case. 

Consider now the case when M. can call its advice, which is g-bounded for 
some polynomial q. In input block i this can happen up to i times. At these 
moments advices will be of size 0{q{i)), and at most 0{i^) of them are needed 
on a length ii prefix. One can take this into account, by modifying the Ai’s so 
they have all these advices encoded in their states and by simulating Ai on the 
given prefix of the input, this time also including the use of advice by Ai. The 
resulting sequence of IFA’s with global states is still polynomially bounded. 

It is clear that in the given simulations A satisfies the finite-delay condition 
iff Ai does. □ 

The theorem implies several analogues for other types of computing devices 
operating with finite configuration spaces. To circumvent the different input- 
output conventions in some cases, we call two complexity classes ‘equal’ only 
when the devices corresponding to both classes read their inputs sequentially; 
otherwise, when the devices in one class read their inputs in parallel, we say that 
they ‘correspond’. 

Theorem 3. The following relations hold: 

(a) IFA-POLY equals ITM-LOGSPACE/poly. 

(b) NA-LOG equals ITM-LOGSPAGE/log. 

(c) GG-POLY corresponds to ITM-PTIME/poly . 

(d) NN-POLY equals ITM-PSPAGE/poly. 

(e) IFA-EXP equals ITM-PSPAGE/exp. 

For later reference we let DSPACE(S'i(t))/admce(S' 2 (t)) denote the complex- 
ity class of all Si (t)-space bounded deterministic TM computations making use 
of S' 2 (t)-space bounded advice. 
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6 Site Machines 

Our next aim is to consider networks of machines. Under this scenario the in- 
dividual interacting machines will be called site machines, or simply sites. A 
site machine is an ITM enhanced by a mechanism allowing message sending and 
receiving very much like well-known I/O- automata [7], but it also allows the 
‘instantaneous’ external influencing of its computational behaviour by changes 
of its transition function in the course of the interaction with the environment. 
More precisely, sites are viewed as follows. 

Individual sites in the network are identified by their address, some symbolic 
number. The addresses of the sites are managed by a special mechanism that 
exists outside of the site machines (see Section 7) . In order to support the efficient 
communication among the sites, the respective ITM’s are equipped with an 
internet tape. This is a tape whose contents can be sent to any other site. To 
do so, the sending machine must write the address of the receiving site and 
its own ‘return’ address, followed by the message, in an agreed-upon syntax, to 
its internet tape. By entering into a special distinguished state, the message is 
sent to the site with the given address. Messages sent to sites with non-existing 
addresses at that time do not leave the sending machine and the sending machine 
is informed about this by transiting to another special state. 

By sending the message successfully, the internet tape of the sending machine 
becomes empty in a single step. The message arrives at the receiving machine 
after some finite time. If at that time the receiving machine finds itself in a 
distinguished ‘message expected’ state (which can be superimposed onto other 
states) then the message is written onto its internet tape, in a single step. The 
receiver is informed about the incoming message by (enforced) entering into a 
distinguished state called ‘message obtained’. Then the receiving machine can 
read the message, or copy it onto an auxiliary tape. After reading the whole 
message the machine can enter into a ‘message expected’ state again. When it 
does, its internet tape is automatically emptied in one step. 

Otherwise, if the receiving machine is not ready to obtain a message (meaning 
that the machine is engaged in writing onto or reading from its internet tape), 
the message enters into a queue and its delivery is tried again in the next step. 
It may happen that two or more messages arrive to a site simultaneously. This 
‘write conflict’ is resolved by giving priority to the machine with the lowest 
address, and the remaining messages enter the queue at the site. 

Each site is operated by an (external) agent. An agent can work in two 
modes: network mode, in which its machine is logged-in and can communicate 
with other sites, and stand-alone mode, in which its machine is not logged-in. 
Switching between the two modes is done by the agent with a special instruction. 

By entering a suitable input sequence via the input port in network mode, 
an agent can instruct its machine to do various specific things. First, the agent 
can instruct it to operate its current ‘program’, sending or receiving messages, 
and performing any computation making use of all data stored on the machine’s 
working tapes, on its internet tape, and the data read from the input port. How- 
ever, while working in network mode the agent is not allowed to change the 
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machine’s hardware and software, i.e., its transition function. This can only be 
done in stand-alone mode. A change of a transition function may change the 
number of the machine’s tapes, its working alphabet and the number of states. 
Such an action is done in finite time, during which time the machine is not con- 
sidered to be the part of the network. Changing the transition function does not 
affect the data written on the machine’s tapes at that time (except for the case 
when the number of tapes is decreased, when only data on the remaining tapes 
persist). After changing the transition function, the agent switches back to net- 
work mode. The machine then continues operating following the new transition 
function. Only the inputs read during network mode are considered to be part 
of the input stream. The same holds for the output stream. 

The instantaneous description (ID) of a site machine after performing t steps 
is given by the description of all its working tapes (including its internet tape), 
the current symbol at its input port, and the corresponding state of its finite 
control at time t. The current position of the tape heads is assumed to be marked 
by special symbols on the respective tape descriptions. 

At time t, the ‘program’ of the site machine M at that time, is described by a 
binary code denoted as (M). It encodes, in an agreed-upon syntax, the transition 
function of the machine. Note that at different times t, a machine with the same 
address can be described by (operationally) different codes, depending upon the 
activities of its agent. 

To formally describe a site machine, we assume that there is a site encoding 
function S that maps, for each time t, the address f of a machine to its encoding 
at that time. The configuration of a site at time t > 0 after processing t inputs 
consists of its address, followed by its encoding and its ID at that time. 

Theorem 4. For interactive translations (j> : — >■ , the following are equiv- 

alent: 

(a) 4> is computed by a site machine. 

(b) 4> is computed by an ITM A4 with advice. 

Proof (sketch): If 4> is computed by a site machine then the value of the site 
encoding function at time t can serve as an advice to the simulating ITM/A. 
Vice versa, when the latter machine has to be simulated by a site machine, 
then each advice reading can be substituted by a site machine update where the 
advice value is encoded in the description of the update. □ 

7 The Web Turing Machine 

The ultimate non-uniform model we introduce is the web Turing machine. A 
web Turing machine (WTM) is a ‘time-varying’ finite set of interacting sites. 
The cardinality of this set, the programs of the machines in the set, as well as 
the message delivery delays can unpredictably vary with time. We only assume 
that the sites share the same notion of time, i.e., we assume a uniform time-scale 
within a given WTM. 
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Let Z+ denote the set of non-negative integers, and let N denote the set of 
natural numbers. 

Definition 7. A web Turing machine Q is a triple Q = {a,5,p) where: 

— a : Z+ ^ 2^^ is the so-called address function which to each time t > 0 
assigns the finite set of addresses of those sites that at that time are in 
network mode. Thus, at each time t > 0, Q consists of the |a(t)| sites from 
the set St = {Mi\i G a(t)} where Mi is the site at the address i. 

— 5 : Z+ X z+ ^ s* is the so-called encoding function which to each time 
t > 0 and address i € a(t) assigns the encoding {Mf) of the respective site 
Mi at that time at that address. 

— pL : Z+ X Z+ X Z+ — >■ N is the so-called message transfer function which to 
each sending site i and each receiving site j and each time t >0 assigns the 
duration of message transfer from i to j at time when the message is sent, 
for i,j G a{t). 

The description of a WTM at time t > 0 is given by the set of encodings of all 
its sites at that time. The configuration of a WTM at time t > 0 corresponding 
to the input read thus far by each site consists of the list of configurations of its 
sites at that time. The list is ordered according to the addresses of the sites in 
the list. 

A computation of a WTM proceeds as follows. At each time, each site which is 
in network mode and whose address is among the addresses given by the address 
function for this time, reads a symbol from A, possibly the empty symbol, from 
its input. Depending on this symbol and on its current configuration the machine 
performs its next move by updating its tapes, state and outputting a symbol 
(possibly A) to its output, in accordance with its transition function. Within a 
move the machine can send a message to or receive a message from an other 
machine. Also, when the machine is in stand-alone mode, its agent can modify 
the transition function of the machine or the data represented on the machine’s 
tape. Moreover, at any time an agent can ‘log in’ (‘log out’) a site into (from) 
the WTM by entering the respective mode of operation. This fact is recorded by 
the values of the functions a and 5 that must change accordingly at that time, 
to reflect the new situation. 

Any WTM acts as a translator which at each time reads a single input symbol 
and produces a single output symbol at each site (some of the symbols may be 
empty) . In this way a WTM computes a mapping from finite or infinite streams 
of input symbols at the sites to similar streams of output symbols. The number 
of streams varies along with the number of sites. The incoming messages are 
not considered to be a part of the input (as they arrive over different ports). 
Of course, the result of a translation does depend on the messages received at 
individual sites and on their arrival times at these sites. However, all messages are 
results of (internal) computations and therefore their sending times are uniquely 
determined; the arrival times to their destinations are given by /x. Therefore, 
for a given ‘packed’ stream of inputs (with each packed symbol unfolding to an 
input at every site), the result of the translation is uniquely determined. 
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Definition 8. Let Q = {a, S, be a WTM. The web translation computed by 
Q is the mapping T such that for all packed streams x and y, -T(x) = y iff on 
input X to its sites, machine Q produces y at its sites. 

The space complexity S(t) of G at time t is the maximum space consumed 
by any site of G, over all input streams of length t. The respective complex- 
ity class will be denoted as WTM — DSPACE{S{n)). By WTM — PSPACE 
and WTM — D LOG SPACE we will denote the classes of all polynomially and 
logarithmically space-bounded web translations, respectively. For a further dis- 
cussion of the complexity issues, see Section 9. 

We conclude this section by a few comments related to the definition of a 
WTM and its operation. First note that we did not require either a, 5 or pL to 
be recursive functions. Indeed, in general there is neither a known computable 
relation between the time and the addresses of the site machines, nor between 
the addresses and the site descriptions or between sender-receiver addresses and 
message transfer times. Thus, for each t > 0 the three functions are given by 
finite tables at best. Second, note that we assumed that at each time G consists 
only of a finite number of sites, as implied by the definition of a. Also note that 
the transfer time of a message depends not only on the address of the sending 
and receiving sites but also on the message issuing time. This means that even 
if a same message is sent from i to j at different times, the respective message 
transfer times can differ. Message delivery time is assumed to be independent 
of the message length. This assumption may be seen as being too liberal but 
dependences can be considered to be amortized over the time needed to write 
the message to the internet tape. 

8 The Power of Web Computing 

At each moment in time, the architecture and the functionality of a WTM are 
formally described by its functions a and 8. These two functions model the fact 
that in practice (as in the case of the Internet) the evolution of the machine 
depends both on the input to the individual sites and on the decisions of the 
respective agents from which the changes in network architecture and site func- 
tionality may result. The agent decisions may in turn also depend on the results 
of previous computations and on messages received at individual sites as seen 
by their respective agents. Under this scenario the description of a WTM may 
change from time to time in a completely unpredictable manner. 

Due to the finiteness of a WTM at each time, its 5 at a given time is always 
finite. Nevertheless, the size of the encoding function over its entire existence, 
for t = 1, 2, . . . , is in general infinite. Intuitively, this is the reason why a WTM 
cannot be simulated be a single ITM with a finite encoding. However, for each 
time t the encoding of a WTM can be provided by an advice function, as shown 
in the following theorem. 

There is one technical problem in the simulation of a WTM by an ITM/A. 
Namely, by its very definition a WTM computes a mapping from packed input 
streams to packed output streams, with packed symbols of variable size. The 
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respective symbols are read and produced in parallel, synchronously at all sites. 
However, a normal TM, working as a translator of infinite input streams into 
infinite output streams using a single input and a single output tape, cannot 
read (and produce) packed symbols of variable size in one step, in a parallel 
manner. What it can is to read and produce packed symbols component- wise, 
in a sequential manner. 

In order to solve this technical problem we assume that the simulating 
ITM/A has a specific ‘architecture’ tailored to the problem at hand. First, we 
assume that it has a single, infinite one-way read-only input tape at which the 
original stream of packed inputs to G’s sites with 

{zi, 12 , . . . , ik} = cx.{t), is written as follows: for consecutive t = 1 , 2 ,..., it con- 
tains a ‘block’ of inputs Blocks and input symbols are 

separated by suitable marking symbols. Second, we assume that the ITM/A has 
one infinite one-way write-only output tape to which outputs are written of the 
form • ■ • > Eigain in a block- wise manner. A pair of two in- 

finite streams of input and output symbols thus obtained is called the sequential 
representation of a web translation F. 

Theorem 5. For every WTM Q there exists a single ITM/A A that acts as a 
sequential translator of the web translation computed by G, and vice versa. 

Proof (sketch): On its tapes A keeps the ID’s of all sites iw G = with 

their current modes. In the advice, the values of all three functions a, S and /i 
are stored. Thanks to this, A can sequentially update the sites in accordance 
with the instructions and updates performed by each site. 

The idea of the reverse simulation is to show that a single site operated by 
a suitable agent can simulate a ITM/A. The role of the agent will be to deliver 
the values of the advice function at times when needed. The machine can do so 
by switching to stand-alone mode and letting its agent exchange its program for 
the program that has the value of the advice encoded in its states. Then the 
interrupted computation will resume. The details are given in [15]. □ 



9 The Efficiency of Web Computing 

The equivalence between WTM- and ITM/A computations was proved with the 
help of simulations. Because we were primarily interested in characterizing the 
computing power of WTM’s, no attempt was made to make the simulations as 
efficient as they could be and to relate the complexity classes of the two models. 
In this section we investigate the computational efficiency of ‘web space’ and 
‘web time’. 

To get rid of some repeated assumptions in the theorems below we will bound 
the growth of the ‘parameters’ of a WTM over time, viz. its number of sites, the 
sizes of its site descriptions, and the message transfer times. More precisely: 

Definition 9. A WTM G = (a, i5, /r) is called S(t)-bounded if it satisfies the 
following restrictions: 
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— the space complexity of Q is S{f), for all t>0, 

— the address length of any site in Q does not grow faster than the space com- 
plexity ofG, i-e., for allt > 0 and any address i € a(t) we have |z| = 0{S(t)), 

— the size of any site encoding does not grow faster than the space complexity 
of G, i.e., for t > 0 we have \{Mj) \ = 0{S{t)) for all j G a(t), and 

— the message transfer times are not greater in order of magnitude than the 
total size of G, i-e., for allt >0 and i,j G a{t) we have p,{i,j,t) = 0(|a(t)|). 

The first restriction bounds the space complexity of each site. The second one 
allows at most an exponential growth (in terms of Sff)) of the synchronous 
WTM with time. The third restriction is also quite realistic; it says that the 
‘program size’ at a site should not be greater than the size of the other data 
permanently stored at that site. The fourth restriction together with the second 
one guarantees that the size (e.g. in binary) of the values of the message transfer 
function is also bounded by 0{S{t)). Note that by restriction one, the message 
length is also bounded by 0{S{t)), since in the given space no longer messages 
can be prepared. 

In the complexity calculations that follow we will always consider a S{t)~ 
bounded, or ‘bounded’ WTM. We will first show that any bounded WTM is 
equivalent to an exponential space-bounded deterministic ITM using an expo- 
nential size advice. 

Theorem 6. For all space hounding functions S{t) > 0, 
y WTM - DSPACE{cS{t)) = y ITM - DSPACE{c^^*'>)/advice{c^'^*'>). 

c>0 c>0 

In particular, 



WTM - D LOG SPACE = ITM - PSP ACE /poly 

Proof (sketch): The left-to-right inclusion is proved by keeping a ‘mirror image’ 
of the WTM on the tapes of the ITM/A. Since our WTM is S'(t)-space bounded 
it can have up to sites of size S{t), for some c. Hence the mirror image 

of the WTM can be maintained in exponential space as claimed. For each t the 
advice size is also bounded by the same expression and the ITM/A uses it to 
simulate the WTM updates. For proving the opposite inclusion we simulate the 
z-th cell of the ITM/A by a special site that keeps the contents of this cell plus 
the information whether the machine’s head is scanning this cell. The advice 
tape is represented in a similar manner. For the full proof see [15]. □ 

This result for space-bounded WTM computations is analoguous to similar re- 
sults known for so-called synchronized computations in uniform models (see for 
example [5], [18]). 

Next we study ‘time’ as a computational resource for WTM’s, viz. the po- 
tential of a WTM to perform parallel computations. In order to make use of this 
potential one has to ensure e.g. when sending requests to two sites to run some 
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computations, that these requests will be accomplished with only a small delay. 
Similarly, after finishing the computation, one has to ensure that both results 
will be returned again with only a small delay. This cannot be guaranteed under 
the original mild assumption that each message will be delivered in an finite, 
albeit unpredictable time. 

In order to enable a genuinely parallel realization of computations we there- 
fore strengthen the restriction on the duration of message deliveries within a 
bounded WTM further. We introduce the unit cost WTM in which each mes- 
sage is assumed to be delivered to its destination within unit time. 

Definition 10. A unit-cost WTM Q is a hounded WTM Q = {a,6,fT) in which 
^{i,j,t) = 1 for all i,j G a(t) and t>0. 

Let WTMif — PTIME denote the class of all translations that can be realized 
by a unit-cost WTM within polynomial reaction time. 

It turns out that a unit-cost WTM can simulate an ITM/A very fast, by 
involving an exponential number of sites in the simulation. Vice versa, a fast 
WTM with ‘many processors’ can be simulated by an ITM/A in ‘small’ space. 
The simulation is sketched in the proof of the following theorem. When speaking 
about the respective models we shall use similar input/output conventions as 
those in Section 8. 

Theorem 7. 



WTMu - PTIME = ITM - PSP ACE /poly 

Proof (sketch): When attempting to simulate a polynomial time-bounded WTM 
in polynomial space on a ITM/A, we run into the problem that a WTM can 
activate an exponential number of processors whose representations cannot all 
be kept on a tape simultaneously. Thus, a strategy must be designed for re- 
using the space and recomputing the contents of each site when needed. The 
non-uniformity of WTM updates is simulated, as expected, by advice calls. 

To simulate a ITM/A of polynomial space complexity S{t) on a WTM of 
polynomial time complexity, imagine the infinite computational tree T of the 
ITM/A computations. For a given input, consider the subtree of T of depth 
with the same root as T and an exponential number of nodes. The simulated 
ITM/A processes the first t inputs successfully iff the path in T that starts in an 
initial ID, ends in an ID that produces the ‘further’ output which is a reaction to 
the t-th input. The existence of such an accepting path is found by making use 
of the parallel version of algorithm that computes the transitive closure of T in 
polynomial space w.r.t. S(t). This simulation must be run for each t = 1,2,.... 
This is achieved by starting the simulation at each time t for that particular 
value of t at a suitable site on the WTM. The involved details of both parts of 
the sketched proof can be found in [15]. □ 

Corollary 1. For any S(t) >0 

y WTM - DSPACE{cS(f)) = y WTMu - TIME{c^^^'>). 

OeO c>0 
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In particular, 

WTM - D LOGS PACE = WTMu - PTIME 

The last result says that (bounded!) WTM’s make use of their space in 
an optimal way: in the given space one cannot perform more time-bounded 
computations than a unit-cost WTM does. 

The result on time-bounded unit-cost WTM’s and its proof, mirrors the simi- 
lar result for uniform, idealized computational models from the ‘second machine 
class’ (cf. [11]). Its members fulfill the so-called Parallel Computation Thesis 
which states that sequential polynomial space is equivalent to parallel polyno- 
mial time on devices from this class. In the non-uniform setting similar results 
are known for infinite families of neural networks of various kinds (for a recent 
overview of known results see [8] ) . 

The results on time and space efficiency of WTM computations rank WTMs 
among the most powerful and efficient computational devices known in complex- 
ity theory. 



10 Afterthoughts 

Theorem 1 points to a rich world of interactive devices that can serve as a basis 
for the investigation of evolving interactive computing systems. In fact, in [20] 
these devices have been interpreted as cognitive automata. Each of them can 
serve as a model of a ‘living organism’ and can be used for further studies of 
the computational aspects of complex systems created from these elementary 
computing units. 

Theorem 3 reveals that not all cognitive automata are equally efficient from 
the viewpoint of their descriptional economy: systems representing their config- 
uration space in unary representation (this is the case of finite automata) suffer 
from space inefficiency. Other systems that make use of more efficient state repre- 
sentations and can reuse their space, such as neural nets, are much more effective 
from this point of view. 

Theorems 2, 4, and 5 point to the central equivalence of the various models, 
summarized in the following Theorem. It points to the fact that the notion of 
evolving interactive computing is a robust and fundamental one. 

Theorem 8. For translations (p : — >■ , the following are equivalent: 

(a) (j) is (sequentially) computed by a sequence of IFA’s with global states. 

(b) 4> is (sequentially) computable by an ITM/A. 

(c) (p is (sequentially) computable by a site machine. 

(d) (p is computable by a WTM. 

Lemma 1 shows the super- Turing computing power of the ITM/A’s and 
separates the model from ITM’s. It also implies that the WTM, which can be 
seen as a quite realistic model of the Internet as far as its computing power is 
concerned, can perform computations that cannot be replicated by any standard 
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interactive TM. This answers Wegner’s claim concerning the power of interactive 
computing. Interaction can lead to computations that no Turing machine can 
mimic, providing that we allow updates of the underlying machinery and consider 
unbounded, potentially infinite computations. The latter condition is a crucial 
one since otherwise one would have but a finite number of updates that could 
be built-in beforehand into the architecture of the ITM. 

Theorem 8 also points to the various ways in which non-uniform features 
may enter into a computing system. First, non-uniformity can be hidden in 
the ‘architecture’ of a computing system. This is the case in sequences of finite 
automata with global states where the description of the system as a whole is 
given by an infinite, in general non-computable string. Second, non-computable 
information may enter into a computing system from an ‘external’ source. E.g. 
in the case of ITM/A’s, this is done by advice functions, and in the case of site 
machines or WTM’s there are agents that can change the machine architecture 
in an unpredictable manner. 

The results from theorem 8 have interesting interpretations in the world of 
cognitive automata and computational cognition. The basic idea is as follows: 
in a ‘robotic’ setting, any interactive finite automaton (viewed as a cognitive 
automaton) can be seen as a simple model of a living organism. A (finite) set 
of cognitive automata can communicate basically in two different ways. First, 
the automata can communicate using a fixed ‘pre-wired’ communication pat- 
tern, so to speak holding hands with their physically immediate neighbors. By 
this we get systems equivalent to sequences of interactive finite automata. If the 
automata communicate in arbitrary patterns or do not have global states, the 
corresponding system of cognitive automata resembles certain types of amor- 
phous computing systems [1]. In principle it is no problem to define cognitive 
automata in such a way that they will also possess a replication ability. Then 
one can consider systems of cognitive automata that grow while computing. As 
a result one gets various morphogenetic computational systems. 

The second possibility for cognitive automata to communicate, is the case 
when the automata are equipped with sensors and effectuators by which they 
scan and change their environment. In the case of ordinary TM’s the ‘living 
environment’ of a single cognitive automaton working under such conditions is 
modelled by TM tapes and read/write heads. Following this analogy further, a 
WTM can be seen as a set of cognitive automata. They share the same living 
environment and communicate via message exchange. They can even move and 
exchange messages, either by encountering each other, or leaving a message else- 
where (probably in a distinguished place) in the environment or by sending a 
message via a chain of neighbors. A specific view of a human society as that of a 
community of agents communicating by whatever reasonable means (language, 
e-mail, letters, messengers, etc.) also leads to a model of WTM with specific pa- 
rameters. What is important and interesting from the point of view of cognitive 
sciences is the fact that irrespectively which possibility is taken, we always get 
a system equivalent to a WTM and hence in general possessing a super-Turing 
computing power. 
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A challenging question still remains unanswered: could one indeed make use 
of the super-Turing potential of the underlying machines to one’s advantage? 
Can one solve certain concrete undecidable problems by such machines? The 
answer is, (un)fortunately, no. From a practical point of view our results mean 
that the corresponding devices cannot be simulated by standard TM’s working 
under a classical scenario. This is because the evolving interactive machinery 
develops in an unpredictable manner, by a concurrent unpredictable activity of 
all agents operating the sites. 

Nevertheless, the above results point to quite realistic instances where the 
classical paradigm of a standard Turing machine as the generic model which 
captures all computations by digital systems, is clearly insufficient. It appears 
that the time has come to reconsider this paradigm and replace it by its extended 
version, viz. by ITM’s with advice. For a more extended discussion of the related 
issues, see [14]. 
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1 Introduction 

Most of the concerns of Distributed Computing may appear in settings which are 
quite different from its traditonal applications ares such as distributed systems, 
data and communication networks, etc. An important setting of this type is the 
one of autonomous mobile robots. 

In fact, the attention in robotics research has moved from the study of very 
few, specialized, rather complex robots to a multiplicity of generic, simple units. 
This shift in attention has occurred both in the robotics engineering and in the 
artificial intelligence communities. Leading research activities in the engineering 
area include the Cellular Robotic System (CEBOT) of Kawaguchi et al [12], the 
Swarm Intelligence of Beni et al. [4], the Self-Assembly Machine (’’fructum”) 
of Murata et al. [14], etc. In the Al community there has been a number of re- 
markable studies, eg., on social interaction leading to group behavior by Mataric 
[13], on selfish behavior of cooperative robots in animal societies by Parker [16], 
on primitive animal behavior in pattern formation by Balch and Arkin [3], to 
cite a just a few. An investigation with an algorithmic flavor has been under- 
taken within the Al community by Durfee [6], who argues in favor of limiting the 
knowledge that an intelligent robot must possess in order to be able to coordinate 
its behavior with others. 

The motivations for this research shift are different, ranging from economic 
concerns (simpler robots are less expensive to design, produce and deploy) to 
philosophical questions (can complex behaviour emerge from the interaction of 
extremely simple entities?). The setting being considered is a community of iden- 
tical extremely simple mobile robots which are possibly capable, collectively, to 
perform a given task. In all these investigations, the algorithmic (i.e., the com- 
putational and software) aspects were somehow implicitly an issue, but clearly 
not a major concern. We now provide an overview of some recent algorithmic de- 
velopments on the coordination and control of such a community of autonomous 
mobile robots. These developments stem from research on the algorithmic limi- 
tations of what a community of such robots can do. 

2 Overview 

The algorithmic research has been carried out by two groups. The earliest 
and pioneering work is that of Suzuki and Yamashita and their collaborators 
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[1,15,18,17] who established several results under the assumption that move- 
ment (as well as any other robot’s action) is instantaneous. The more recent 
research work is by Flocchini et al [7,8,9] who remove this assumption. We now 
describe their general model. 

Each robot is capable of sensing its immediate surrounding, performing com- 
putations on the sensed data, and moving towards the computed destination; its 
behavior is an (endless) cycle of sensing, computing, moving and being inactive. 
The robots are viewed as points, and modeled as units with computational ca- 
pabilities, which are able to freely move in the plane. They are equipped with 
sensors that let each robot observe the positions of the others and form its local 
view of the world. This view includes a unit of length, an origin (which we will 
assume w.l.g. to be the position of the robot in its current observation), and a 
Cartesian coordinate system with origin, unit of length, and the directions of two 
coordinate axes, identified as x axis and y axis, together with their orientations, 
identified as the positive and negative sides of the axes. Notice, however, that 
the local views could be totally different making impossible for the robots to 
agree on directions or on distances. 

The robots are anonymous, meaning that they are a priori indistinguishable 
by their appearances, and they do not have any kind of identifiers that can be 
used during the computation. Moreover, there are no explicit direct means of 
communication. The robots are fully asynchronous', the amount of time spent in 
observation, in computation, in movement, and in inaction is finite but other- 
wise unpredictable^. In particular, the robots do not (need to) have a common 
notion of time. The robots may or may not remember previous observations or 
computations performed in the previous steps; they are said to be oblivious if 
they do not remember. 

The robots execute the same deterministic algorithm, which takes as input 
the observed positions of the robots within the visibility radius, and returns a 
destination point towards which the executing robot moves. 

A robot is initially in a waiting state (Waif); asynchronously and indepen- 
dently from the other robots, it observes the environment in its area of visibility 
{Look)', it calculates its destination point based only on the observed locations 
of the robots in its (Compute)', it then moves towards that point (Move)', after 
the move it goes back to a waiting state. 

The sequence: Wait - Look - Compute - Move will be called a computation 
cycle (or briefly cycle) of a robot. 

The operations performed by the robots in each state will be now described 
in more details. 

1. Wait The robot is idle. A robot cannot stay infinitely idle (see Assumption 
Al below). 

2. Look The robot observes the world by activating its sensors which will 
return a snapshot of the positions of all other robots with respect to its local 
coordinate system. (Since robots are viewed as a point, their positions in the 
plane is just the set of their coordinates). 

^ Suzuki and Yamashita assume instead that is instantaneous. 
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3. Compute The robot performs a local computation according to its deter- 
ministic, oblivious algorithm. The result of the computation is a destination 
point; if this point is the current location, the robot stays still {null move- 
ment), 

4. Move The robot moves towards the computed destination; this operation 
can terminate before the robot has reached it^. The movement can not be 
infinite, nor infinitesimally small . 

3 Results 

3.1 Pattern Formation with Unlimited Visibility 

Consider the coordination problem of forming a specific geometric pattern in the 
unlimited visibility setting. The pattern formation problem has been extensively 
investigated in the literature (e.g., see [5,17,18,19]), where usually the first step is 
to gather the robots together and then let them proceed in the desired formation 
(just like a flock of birds or a troupe of soldiers). The problem is practically 
important, because, if the robots can form an given pattern, they can agree on 
their respective roles in a subsequent, coordinated action. 

The geometric patterns is a set of points (given by their Cartesian coordi- 
nates) in the plane, and it is initially known by all the robots in the system. 

The robots form the pattern, if, at the end of the computation, the positions 
of the robots coincides, in everybody’s local view, with the points of the pattern, 
where the pattern may be translated, rotated, scaled, and flipped into its mirror 
position in each local coordinate system. Initially, the robots are in arbitrary 
positions, with the only requirement that no two robots be in the same position, 
and that, of course, the number of points prescribed in the pattern and the 
number of robots are the same. 

The pattern formation problem is quite a general member in the class of 
problems that are of interest for autonomous, mobile robots. It includes as special 
cases many coordination problems, such as leader election, where the pattern is 
defined in such a way that the leader is uniquely represented by one point in the 
pattern. 

Suzuki and Yamashita [18] solve this problem with instantaneous movements, 
characterizing what kind of patterns can be formed. All their algorithms are non- 
oblivious. 

Without instantaneous movements, the following theorem summarizes the 
results holding for a set of n autonomous, anonymous, oblivious, mobile robots: 

Theorem 1. ([7]) 

1. With common knowledge of two axis directions and orientations, the robots 
can form an arbitrary given pattern. 

2. With common knowledge on only one axis direction and orientation, the 
pattern formation problem is unsolvable when n is even. 

^ e.g. because of limits to the robot’s motorial autonomy. 
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3. With common knowledge only on axis direction and orientation, the robots 
can form an arbitrary given pattern if n is odd. 

4- With no common knowledge, the robots cannot form an arbitrary given pat- 
tern. 

The class of patters which can be formed with common knowledge on only 
one axis direction and orientation when n is even (i.e., when the arbitrary pattern 
formation problem is unsolvable) has been fully characterized [9] . 



3.2 Flocking with Unlimited Visibility 

Consider flocking: a set of mobile units are required to follow a leader unit while 
keeping a predetermined formation (i,e,. they are required to move in a flock, 
like birds or a group of soldiers) . 

Flocking has been studied for military and industrial applications [2,3,19], 
assuming that the path of the leader is known to all units in advance. 

The more interesting and challanging problem arises when the units in the 
flock do not know beforehand the path the leader will take; their task is just to 
follow it eherever it goes, and to keep the formation while moving. An algorith- 
mic solution, without assuming neither a priori knowledge of the path nor its 
derivability, has been recently presented in [10]; the algorithm only assumes the 
robots share a common unit of distance, but no common coordinate system is 
needed. 



3.3 Gathering with Limited Visibility 

Consider gathering: the basic task of having the robots meet in a single location 
(the choice of the location is not predetermined) . Since the robots are modeled as 
points in the plane, the task of robots gathering is also called the point formation 
problem. Gathering (or point formation) has been investigated both experimen- 
tally and theoretically in the unlimited visibility setting, that is assuming that 
the robots are capable to sense (“see”) the entire space (e.g., see [7,11,17,18]). 
In general, and more realistically, robots can sense only a surrounding within a 
radius of bounded size. This setting, called the limited visibility case, is under- 
standably more difficult, and only few algorithmic results are known [1,18]. 

In the limited visibility setting, this problem has been investigated by by 
Ando et al. [1], who presented an oblivious procedure that converges in the 
limit, but does not reaches, the point. Furthemore, instantaneous actions are 
assumed. 

In [8], Flocchini et al. proved that the availability of orientation^ enables 
anonymous oblivious robots with limited visibility to gather within a finite num- 
ber of moves even if they are fully asynchronous. 

® i.e., agreement on axes and directions (positive vs. negative) of a common coordinate 
system, but not necessarily on the origin nor on the unit distance 
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This shows that gathering can be performed in a finite number of moves by 
simpler robots with fewer restrictions than known before, provided they have a 
common orientation. 

From a practical point of view, this result has immediate consequences. In 
fact, it solves the problem without requiring the robots to have physically un- 
realizable motorial and computing capabilities (“instantaneous actions”), and 
using instead a property (“orientation”) which is both simple and inexpensive 
to provide (e.g., by a compass). 

4 Open Problems 

The known results are few and open many interesting research directions, among 
them: 

the study of coordination problems under relaxed assumptions about the robots’ 
capabilities (e.g., the robots are not totally oblivious, and can remember only 
a constant amount of information); 

the investigation of simple tasks under different conditions of the environment 
(e.g., in presence of obstacles, or on uneven terrains); 
the study of the impact that sensorial errors, possibly arising during the Look 
and the Move state, have on the overall correctness of the algorithms; 
the design of new algorithms under different assumptions on the visibility power 
of the robots (e.g., the accuracy of the robots’ ability to detect the other 
robots’ positions decreases with the distance); 
the study of new problems like scattering (i.e., the robots start from the same 
location and their goal is to evenly scatter on the plane), rescue (i.e., the 
robots have to find a small object which is not initially visible), exploration 
(i.e., the robots have to gather information about the environment, with the 
purpose, for example, of constructing a map), and many more. 

At a more general level, there are interesting fundamental research questions. 
For example, how to compare different solutions for the same set of robots? So 
far, no complexity measure has been proposed; such a definition is part of our 
future research. 
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Abstract. Functional validation of hardware designs is a major chal- 
lenge for circuit design companies. Post-delivery software problems can 
be addressed by subsequent software releases; however, fixing hardware 
bugs in any shipped prodnct is expensive. Simulation remains the dom- 
inate functional validation method, bnt in the last decade, formal veri- 
fication (FV) has emerged as an important complementary method. We 
describe basic FV methods: theorem proving, model checking, and equiv- 
alence checking with some illustrations from their applications to Alpha 
microprocessor designs. The last one is described in detail. Althongh 
theoretically, FV can provide mnch more complete verification coverage 
than simulation, our ability to apply FV is limited due to capacity limits 
of existing FV tools and the availability of trained personnel. The appli- 
cation of FV to industrial designs is an active research area with huge 
opportunities for academic and industrial researchers. 



1 Introduction 

The difficulty of validating modern microprocessor designs has dramatically in- 
creased during the last two decades as the complexity of microprocessor circuits 
has increased. Increasing transistor density (which quadruples every three years), 
device speed, and die size, impact all aspects of design. Designers must correctly 
target future technology so they produce the right design at the right time. Be- 
sides technology changes, chip designers are driven by changes in the competitive 
marketplace, where emerging applications may require new functionality, redis- 
tribution of the resources, higher bandwidth, more parallelism, etc. All these 
factors have influence on microprocessor architecture and its final design. The 
rising volume shipments increase the repair costs that a company bears in case of 
a flawed design. Designers experience tremendous pressure when coping with two 
controversial requirements - to engineer a correct design on a short development 
schedule; under these circumstances, design validation is a real challenge. 

A correct design has to obey various physical design rules while providing 
the required functionality. In this paper, we will discuss only a small subset 
of validation methods; specifically, functional verification, further restricted to 
formal verification (FV) methods. 

The goal of functional verification is to assure the logical correctness of a 
design, exclusive of physical requirements like timing and power usage. In this 
paper, we only address the design verification problem using formal methods. 
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and we do not discuss other verification methods like simulation and post- 
manufacturing testing. This paper is not intended to be an overview of all formal 
verification methods, neither does it provide an exhaustive description of the way 
formal methods can be applied in an industrial design flow. Note, design meth- 
ods may be quite different from company to company, and even from project 
to project within a single company. In some cases, different methods may be 
used on different parts of the same design and in different phases of the project. 
Our goal is to give you the flavor of industrial hardware verification problems, 
and how some of them were addressed using FV methods in Alpha Development 
Group. 

The author would like to share with you her experience in the very excit- 
ing and creative environment of microprocessor design, where she moved from 
academia, being quite naive and a bit arrogant, to the complexity of the in- 
dustrial engineering problems. Soon, the author discovered how difficult is it to 
create a tool that really works in all (not just in most) cases. It takes a short 
time to implement the core of a tool, but it takes an enormous effort to get such 
a tool to work for all possible inputs, taking into account the unpredictability 
of its usage. For ideas that take a few days to conceive and implement, it may 
take weeks, or even months, to refine them to a point where they provide stable, 
correct, and user friendly operation. 

The other lesson learned was that not always the most sophisticated solution 
is the best. For example, one of the faster satisfiability checkers. Chaff [14], has 
a very simple implementation but it is tailored to run well on available micro- 
processors. It outperforms more complex algorithms designed from the results 
of long-term research. These observations do not invalidate the work of theoreti- 
cians (because without their effort, we would not have today’s understanding 
of the problem) , they just confirm that there is a large gap between theoretical 
results and their application. 

This paper consists of two main parts: an overview of the main formal verifi- 
cation methods deployed in the industrial hardware verification, and a close look 
at our in-house verification tools tailored for the needs of Alpha microprocessor 
designers. Specifically, we focus on the use of the BOVE verification tool [12] 
and its features that provide a useful formal verification environment for Alpha 
designers. 



2 Formal Functional Verification 

Formal functional verification is the use of formal methods to establish the cor- 
rectness of a design in much the same way that simulation is used. The difference 
is that instead of checking the results of a simulation, which produces concrete 
values, we check the results of a symbolic simulation which are equations. For 
instance, imagine we wish to verify the correctness of a 64-bit adder - a very 
common and often used device in a microprocessor design. We can simulate the 
operation of this 64-bit adder design; however, to check all of the possible input 
combinations requires 2^^® simulator runs! If this adder is represented in an un- 
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ambiguous hardware description language, we can derive equations describing 
the functionality of the design, and prove formally that its output provides the 
sum of its inputs. The result of such formal verification is equivalent to run- 
ning 100% of the possible input combinations. Often, it is much cheaper (less 
time, effort, and computing resources) to use formal verification than simulation. 
However, formal methods can be used only in places where we have a rigorous 
description of a model against which (a part of) a design can be compared. 

Note, that there are different formal methods, several of which we will sum- 
marize, that provide different approaches for obtaining verification results. As 
the paper progresses, we become more and more specific, finally describing the 
BOVE equivalence tool and its use in verifying Alpha microprocessor designs. 

2.1 Design Flow 

Before we start to talk about our verification methods, the reader should have a 
basic understanding of our design flow, which is shown in a simplified manner in 
Figure 1. At the top level, the architecture is defined by its instruction set. The 
Instruction Set Architecture (ISA) is usually specified by several hundred pages 
of English text, and in some places it is enhanced with pseudo-code or other 
pseudo-formal descriptions of its functionality. The Alpha Architecture Refer- 
ence Manual [2], which includes the ISA specification, has about 800 pages. It 
describes the influence of each operation on the architectural state, and how ex- 
ceptions and interrupts are treated. Each generation of Alpha microprocessor is 
obliged to comply with this specification and to preserve backward compatibility 
of the ISA. 

For each microprocessor development project, an executable model (usually 
in some imperative language like C, or C-| — h) is written for measuring perfor- 
mance of architectural decisions. This model is simulated more than any other 
model, giving architects confidence in their ideas. New versions of such Alpha 
models can be produced by augmenting a previous microprocessor generation 
model with new required features. This model is tested for months and is used 
throughout the design effort. 

Another critical model is the description of the microarchitecture at the 
register-transfer level (RTL) . How closely this model relates to the physical par- 
tition of the design differs from project to project. Our company uses its own hi- 
erarchical hardware description language, named Merlin. The Merlin RTL model 
is thoroughly simulated against the higher-level model, and is also subjected to 
timing analysis and power estimation. 

The RTL design is the specification for the transistor-level schematics. To 
achieve the highest possible performance, the majority of each Alpha design has 
been fully custom. Only small, non-critical portions of the design are created 
by synthesis tools. After the transistor-level schematics are completed, these 
schematics are passed to layout engineers. 
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Fig. 1. Design flow and points where formal verification is applied 



2.2 Verification Task 

The general verification goal is to assure that the result of the design process is 
correct. In this paper, we consider only functional correctness. To achieve this 
goal, we need to assure that no logical flaws (bugs) are introduced into a design 
as it moves through the design flow. In each design stage, the refined design is 
compared to the result produced by the previous stage. Bugs can be introduced 
in any of the design phases, and there are different phase-specific methods that 
help to eliminate them. Some of the validation methods are incomplete, but they 
are still useful in finding many bugs in the early stages of the design process. 



2.3 Formal Verification Techniques 

In this section, we briefly describe basic formal verification approaches, starting 
with high-level verification approaches and concluding with methods applied to 
the lower-level design artifacts. 



Theorem Proving. Let us consider an ideal situation when our ISA is written 
in a strict mathematical way, and our RTL language has clear semantics. In 
that case, having the system and its specification formulated by means of logic 
formulae allows its correctness to be established as a logical proof. Because of 
the size of industrial hardware designs, this task cannot be done by hand. There 
are several automated proof checkers for first and higher-order logics available, 
e.g., HOL [27], PVS [38], Otter [36], and ACL2 [1]. Theorem proving has been 
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successfully used on number of university projects, but also at AMD [44], Intel 
[26], and Rockwell-Collins [31]. 

Unfortunately, we do not have this ideal situation. There are no formal spec- 
ifications available, and verification engineers have to write them by hand, using 
informal specifications as a guide. It requires a lot of effort, and the availability 
and willingness of architects to discuss ambiguous specifications. Also, hardware 
description languages often have constructs that have no clear semantics, and 
are, therefore, hard to formalize. In better cases, it is possible to translate RTL 
code into a formal language. This transition needs to be automatic, because 
designs consist of hundreds thousands of lines of code, and they are frequently 
changed. The translation also needs to be straightforward enough not to intro- 
duce even more bugs. Despite the huge progress in the development of theorem 
provers, the application of theorem proving to RTL verification is far from au- 
tomatic. It has been used only in restricted way - to a simple design that was 
created with the idea of formal methods in mind [29] , or restricted to a part of a 
design like a floating-point arithmetic unit that has historically had a formal (or 
close to formal) specification [44,26]. Because of the problems mentioned and the 
unavailability of skilled verification engineers, theorem-proving-based validation 
covers at most a very small part of the chip design process. In our environment, 
theorem proving was applied for the verification of network protocols [33] . 



Model Checking (MC) is an automatic verification technique to prove tempo- 
ral properties of finite-state systems. The idea of temporal logic is that the truth 
value of a formula is dynamic with respect to time - it can be true in some state 
of the model and false in another. Temporal logic allows a user to specify the 
order of events in time, e.g., “if package A arrives before package B, it departs 
the queue before B”, or “if signal A is set to 1, it will remain set until B gets 
reset” . An efficient algorithm for model checking was introduced independently 
by Emerson and Clarke [21,16] and Queile and Sifakis [42]. An excellent source 
for results in the application of model checking is a recent textbook [18]. MC 
started as a software verification technique, and later, found its way to hardware 
verification, where it has been successfully used to verify RTL code. 

The original MC algorithm was based on the idea of explicit state space 
exploration; therefore, it could handle systems with only a modest number of 
states. The big breakthrough for MC was the introduction of BDDs (see Sec- 
tion 2.4) to represent formulas and state sets; this kind of MC is called symbolic 
model checking. Modern symbolic MC systems [47] are able to handle finite state 
machines with very large state spaces, e.g., up to 2^°° states. However, this is still 
a big restriction for industrial-sized hardware. In order to verify a meaningful 
piece of hardware, tools with vastly more capacity are required. 

Although MC is considered to be an automatic verification method, its ap- 
plication involves a lot of engineering. The reason is the limited capacity of 
available model checkers. There are different ways to reduce the state space, but 
any reduction that is not done properly may mask errors, or even introduce new 
errors. In the case when model-checking is used to find bugs instead of proving 
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their absence, this approach with reduction is considered safe. We can reduce 
the portions of circuit in a way that does not change the behavior of the circuit 
with respect to the verified property. Reductions can be done rigorously with 
the aid of a theorem prover. 

Another effort required to use MC is the modeling of the environment in 
which the hardware unit to be checked operates. Not all inputs to the unit are 
possible. Finding a violation of a desired property for invalid inputs is irrelevant. 
Therefore, engineers write transactors to model the environment of the unit so 
as to rule out impossible behaviors. Once, the transactors are written (which 
is hard work), the model can be further reduced by other methods: constant 
propagation, removing redundant circuitry, reduction of state-holding elements, 
etc. Writing properties and decoding failure traces can be a non-trivial task that 
requires good knowledge of the tools used and of the unit being verified. 

The capacity limitation of MC has motivated the search for other approaches 
to the verification task. For instance, bounded model checking [10,9,46] is a re- 
sult of such effort. A fixed-length prefix of a computation is described by a 
formula which is passed to a satisfiability checker. The existence of a satisfying 
assignment indicates a problem within the prefix of the computation. Although 
incomplete, this method is often fast for discovering bugs, and has been suc- 
cessfully used in our verification methodology [13]. An alternative MC method 
known as symbolic trajectory evaluation (STE) [48,5,28] uses symbolic simulation 
techniques. A STE logic allows the formulation of simple temporal properties of 
a transition system. Its advantage is that it is less sensitive to state explosion. 
Examples of verifications of designs with 5000 or more latches, i.e., systems 
operating over the state space of states, have been reported. It has been 
successfully used by IBM, Motorola, and Intel, to verify transistor-level and RTF 
designs [51,39]. Our company used STE based on a SAT-solver to prove simple 
temporal properties of the Alpha memory subsystem [13]. 

Model checking and theorem proving may be considered complementary tech- 
nologies. Theorem provers provide formalisms expressive enough to specify an 
entire processor, but they require manual guidance. The MC verification algo- 
rithm is fully automatic, but its application is unfeasible for designs with large 
state spaces. It seems natural to combine these techniques into a system that 
uses strengths of both [45,32,3,24]. Currently, such systems are built either on a 
top of model checkers that use theorem proving to decompose the problem into 
smaller subproblems, or conversely on a top of a theorem prover that uses model 
checking to prove subgoals. 



Equivalence Checking is a method applied to the following task: given two 
models of a circuit, prove that they have the same input-output behavior. This 
task is relevant for the new designs, that we want to verify against their speci- 
fications, or to modified designs. The latter case occur often in the later stages 
of the design project, when designers get feedback from the timing and power 
estimation tools, and the designs undergo many changes. We may also apply 
equivalence checking to modified specifications. 
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Equivalence checking requires different approaches for combinatorial and se- 
quential circuits. A combinatorial circuit is an acyclic circuit without state- 
holding elements. Each output can be described as a Boolean function of its 
inputs. The verification task is then reduced to the equivalence of Boolean func- 
tions. The basic method used for combinatorial equivalence checking is the con- 
struction of a canonical representation of those functions (see Section 2.4). Non- 
equivalence can be formulated as a satisfiability problem, and it is useful for 
finding bugs. 

Even though most circuits are not combinatorial, in many situations, when 
the modifications are done within the latch boundaries, or the design method- 
ology preserves state encoding of the machines, the verification problem can be 
decomposed into many combinatorial equivalence checking tasks. Verity is an 
example of a tool based on this strategy [50]. However, our methodology, does 
not bind designers to keep the state encoding. Therefore we have to deal with 
sequential circuits that are modeled by finite state machines. 

Finite state machines considered in this paper are deterministic Finite State 
Machines (FSM) of Mealy type [30]. A FSM is a 6-tuple 

{I,S,S,So,0,X) 



where: I is an input alphabet 
S' is a finite set of states 
(5:Sx/i— >-Sisa transition function 
So C S is a set of initial states 
O is an output alphabet, and 
A : S X J I— >■ O is an output function 

Nondeterministic FSM can be defined similarly, if we replace the notion of the 
transition function by a transition relation. Nondeterministic FSMs can be used 
to model systems that are incompletely specified, or have unpredictable behavior. 

Digital systems are modeled by FSMs with binary encoded input and output 
alphabets. Two FSMs with the same input and output alphabets, and disjoint 
sets of states are equivalent, if starting from initial states, for any sequence of 
inputs they give the same sequence of outputs. Equivalence of digital systems 
modeled by FSMs can be formulated also by means of their product machine. 
Let 

fW = ({0,l}”,S'W,,5«,S'^*\{0,l},A(*)),for i G {0,1} 

are two FSMs, that model a system with n input signals and one output signal. 
The Product machine F = F^^'^ x F^'^'> is an FSM 



({0,l}”,^,5,Ao,(0,l},A) 



where: S = x is Cartesian product of 5^°^ and 

Vso G G G {0,1} : <j((so, si), a:), a:)] 

^0 = X 

Vso G G S^^\x G {0,1} : A((sq, si), a;) = (A(°)(so,a:) = A(^)(si,x)) 
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and are equivalent, if their product machine always produces 1 as 
output. This can be seen as a special case of proving an invariant, which is 
a property that is expected to hold on all reachable states. There are three 
operators used in reachability analysis: image, pre-image, and back-image. Let 
R and S be predicates, each representing a set of states, and 5 be a transition 
function of a system. 

Img{R,6) = S such that S{x') = 3x3y : R{x) A {x' = 6{x,y)) 
Preimg{S, 5) = R such that R{x) = 3y : S{5{x,y)) 

Backimg{S,S) = R such that R{x) = 'iy : S{5{x,y)) 

From now on, we will not distinguish between a set and the predicate that 
defines it. There are two basic methods for invariant checking, both based on 
fixpoint computation. Forward reachability analysis follows all the possible ma- 
chine steps from initial states, using the Img operator, and checks whether an 
invariant holds on all reachable states. In contrast, backward reachability is based 
on the other two operators. It explores the state space starting from valid states 
(states where the invariant holds) or from invalid states (states where the invari- 
ant does not hold). In the former, it computes the superset of initial states for 
which the invariant holds on all computations of fixed length, incrementing this 
length iteratively until the fix point is reached. In the latter, it tries to prove 
that none of the states from which the machine could get into an invalid state is 
an initial state of the machine. Figure 2 shows heavily simplified algorithms for 
invariant checking. Their efficient implementation has been intensively studied. 
The result of that research are techniques that involve different tricks with tran- 
sition function and state set representation, decomposition, and approximation. 

2.4 Symbolic Computations 

Many of the problems mentioned in the previous sections involve the manipula- 
tion of large sets; for example, set of states or set of mismatch cases. A tiny piece 
of hardware containing one hundred state-holding devices is modeled by a system 
with states. Any attempt to use an algorithm based on enumeration of this 
set is a priori going to fail. Fortunately, these sets have often a regular structure, 
and their characteristic function has manageable size with appropriate represen- 
tation. Appropriate in this context means that there are efficient algorithms for 
the operations required by the application. In our case, these operations include: 
quantification, image computation, basic set operations, emptiness check, on- 
set computation, and Boolean operations. One such representation of Boolean 
functions, and consequently sets provide Ordered Binary Decision Diagrams. 
They were first applied by Bryant [4], who recognized their advantage for the 
verification of combinatorial circuits. 

An Ordered Binary Decision Diagram (BDD) over a set of Boolean variables A„ 
is a directed acyclic graph with the following properties: 
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Forward Traversal (V,S,So) 

{ 

N = So 

while ((Ai / 0) A (AT n F = 0)) 

{ 

T = Img{N, 5) 

N = TnR 
R = RUT 

} 

if {N = 0) return 1 
else return 0 



Backward Traversal 1 {V,5,So) 

{ 

R = N = V 

while (Ai / 0) A (AT n So = 0) 

{ 

F — Preimg{N, 5) 

N = FnR 
R = RUN 

} 

if (A^ n So = 0) return 1 
else return 0 



Backward Traversal 2 (V,5,So) 

{ 

R — S /* set of all states */ 
F = V 

while {R D So) A (R 7 ^ F) 

{ 

R = F 

F = Backimg[R, 5) 

} 

if (So 5Zi R) return 0 
else return 1. 



Let V be the set of valid states, <5 be the transition function, and So be the set of initial 
states of the FSM to be analyzed. After i iterations of the loop in Forward Traversal, 
T is the set of states reachable in i steps, and A^ is a subset of T that contains all states 
reachable in i, but not fewer steps. After i iterations of the Backward Traversal 1 loop, 
F is the set of states that can bring machine to an invalid state in i steps, and N 
is a subset of unexplored states of F, i.e., the states which predecessors have not yet 
been considered. After i iterations of the Backward Traversal 2 loop, we know that 
any computation of length less than i, that starts in R, satisfies the invariant. All 
procedures return 1, if all reachable states are valid; otherwise they return 0. 

Fig. 2. Forward and Backward Traversals 



1. Sink nodes are labeled by Boolean constants. 

2. Each internal node is labeled by a variable from and has two successors 
- one labeled by low, the other by high. 

3. There is a strict order of variables, that is obeyed on all root-to-sink paths. 

Any assignment of Boolean values to variables in X defines a root-to-sink path. 
The label of the sink matches the value of the represented Boolean function for 
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that assignment. The size of a BDD is defined as the number of its internal 
nodes. Any BDD can be reduced to its minimal form in time linear with respect 
to its size. This form is a canonical form, i.e., it is unique for each function 
and fixed variable order. The main advantage of BDDs is the efficiency of their 
manipulation. The following operations can be performed in polynomial time 
with respect to the size of the manipulated BDDs: 

— Boolean, and set-theoretic operations; 

— cofactors, i.e., restriction of the function by fixing a value of a variable; 

— evaluation of the function for a given input assignment; 

— satisfiability, tautology test, and equivalence; and 

— existential and universal quantification over a constant number of variables. 

Significant engineering effort has been spent in tuning the performance of 
BDD algorithms [35], and today, there are a number of high-quality, publicly 
available BDD-packages [15]. Most microprocessor design companies have their 
own in-house BDD packages that works as the core engine of their equivalence 
checking software. 

The first application of BDDs to sequential verification was McMillan’s ap- 
proach to symbolic reachability analysis and model checking [47,8] . The practical 
advantage of symbolic computation is that it provides the set of all solutions at 
once. Symbolic computations unify representation of sets, relations, and func- 
tions, which is useful, in particular, for fixpoint computation based on image 
operators. 

The main concern about the use of BDDs was their sensitivity to variable 
ordering that may make an exponential difference in their size. This is a serious 
drawback as it requires an additional engineering effort even for those functions 
that may be represented efficiently. This unpleasant feature forces verification 
engineers to become experts not only in design but also in BDD techniques. 

When it became clear that dealing with designs that had more than a couple 
of hundreds of state-holding devices would be beyond the capacity of BDD- 
based verification tools, researchers started experimenting with new ideas. The 
satisfiability problem [41] has been studied for decades as a basic NP-complete 
problem. It is easy to formulate, but hard to solve, and it remains a big challenge 
for researchers. Recently, several research groups reported the results of their ex- 
periments with SAT-based verification methods [7,25,46]. There are several pub- 
licly available satisfiability checkers ([34,14]), and a ranking of the top 10 SAT 
solvers can be found on http://www.lri.fr/~simon/satex/satex.php3. The devel- 
opment of many of them was motivated by the needs of industry. Most of them 
are based on the Davis-Putnam procedure [20] enhanced with learning methods, 
advanced backtracking techniques, and sophisticated branching heuristics. An 
interesting approach to the SAT problem is the Stalmarc’s method [49]. As a 
complementary technique to BDDs, SAT has proved to be particularly effective 
in the context of bug-finding methods, like bounded model checking, or STE, 
but there is no clear winner with respect to all methods. A combination of both 
seems to be a good solution [52,43]. In our environment, SAT-based methods 
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reduced the time spend waiting for failure traces from days to minutes, or even 
seconds. This allows fast tuning of transactors and finds many of the bugs. The 
BDD-based methods have been shown to be more useful in the later stages after 
the most bugs have been found and validity needs to be proved. 



3 Sequential Equivalence Checking 

In the last decade, the Alpha [22,23] has often been considered to be the fastest 
available microprocessor. This performance advantage was achieved by advanced 
architecture and full custom circuit design. Designers are capable of much higher 
optimization by unusual creative solutions than any state-of-the-art synthesis 
tool. However, humans make mistakes, and no matter how smart and experienced 
circuit designers are, their designs require validation; the transistor-level design 
needs to be checked against RTL model. 

The traditional way to check that transistor-level design is functionally equiv- 
alent to RTL code is through simulation. Generally, there is a suite of regression 
tests on which the RTL model is run in parallel with a model extracted from its 
transistor-level design. In each cycle, signal values for both models are computed 
and compared, and an error is reported in the case of mismatch. The capacity 
and speed of our simulator is good enough to perform tests on a full chip. Unfor- 
tunately, for obvious reasons, there is no way to exhaustively simulate an entire 
microprocessor, and it is also hard to estimate how good the test coverage is. 
When a mismatch does occur, it can be difficult to locate its source. It is expen- 
sive to allocate designer time to create tests for “small” pieces of their designs. 
There is a need for a tool that can do both - debugging and verification. 

There are several vendors that offer equivalence checkers, but it is hard to find 
one that would satisfy all our needs. Our in-house BOolean VErifier (BOVE) is 
tailored to our methodology and designers’ style. It runs on our Alpha platforms, 
and users get full support and fast response to their requests. Each project often 
require modifications to the methodology that are promptly implemented. 

Initially, BOVE was developed primarily for debugging, but the tool went 
through substantial modifications to extend its functionality and user friendli- 
ness. It has become an indispensable part of our validation methodology. The 
first application of the tool was performed by its developers on a part of Al- 
pha 21264 design [12]. During the development of the Alpha 21364, it was run 
mostly by designers and architects, with occasional assistance of the developers. 
The methodology flow of Alpha 21464 project expected architects and designers 
to use the tool for the entire design. 

BOVE is used in bottom-up manner: each designer applies it repeatedly on 
the transistor-level schematic under development. In the process of debugging, 
a designer creates the mapping of the equivalent signals, and specifies don’t 
cares and preconditions, that are necessary to prove the correctness. Once a 
schematic has been verified, the verification artifacts are stored to be used in 
nightly regression runs that check all subsequent modifications of the design. 
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They can also be reused later in the verification task of the yet larger unit 
containing several schematics. 

3.1 Verification Scheme 

BOVE is an equivalence checker for gate-level networks. It performs all compu- 
tations symbolically using BDDs [11,15]. Translators from RTL code and from 
transistor-level networks to equivalent gate-level networks, allow us to use BOVE 
for different tasks: checking correctness of modified RTL, modified schematics, or 
new schematics against RTL. We will focus on the latter task in this paper. We 
will refer to the network that is to be verified as the implementation, while the 
network against which we verify the implementation is called the specification. 

A problem instance for BOVE consists of a specification, an implementation, 
a mapping between inputs and the outputs of both modules, and a mapping 
of additional internal compare points that are assumed to be equivalent. The 
formulation of the goal is: assuming equivalence of the inputs of the modules as 
described by the input mapping, prove equivalence of all mapped nodes (internal 
compare points and outputs). Compare points allow us to break the task into 
smaller subtasks. BOVE keeps track of the individual subtasks that are success- 
fully solved and those that need to be done, and at the end of each session reports 
its progress. For different reasons described in the following sections, BOVE may 
fail to prove some of these equivalences, falsely reporting mismatches. We made a 
substantial effort to eliminate these false negatives. The main rule BOVE is built 
upon is that it never returns a false positive, i.e., if the tool successfully finishes 
its task and reports that all compare points are equivalent, the models being 
compared are equivalent, given that the inputs satisfy the relation described by 
the mapping, and the machines are resetable (see next section). 

3.2 The Equivalence Problem on Slices 

The mappings of compare points implicitly specify the set of separate equivalence 
problems. To prove the equivalence of two nodes, BOVE analyzes the fan-in of 
the nodes (either back to primary inputs or to compare points), which we call a 
slice. The slice inputs are assumed to satisfy the given mappings. BOVE tries to 
prove that the nodes as functions of slice inputs are equivalent. For most of the 
signals, we do not need to investigate the entire fan-in back to the primary inputs. 
BOVE attempts to find the smallest slice suitable for proving the equivalence of 
the two signals. Then it analyzes the slice to determine what algorithm can be 
used for the comparison. 

A slice that does not contain state elements or loops, is called combinatorial. 
The algorithm for combinatorial equivalence is the simplest: we build the BDD 
representation for both nodes, and because of the canonicity of BDDs and the 
use of hash tables that assure that each unique BDD is stored only once, the 
equivalence check is finally reduced to a pointer comparison. 

In a design with an aggressive clock speed, signals are latched frequently. 
This results in slices that have latches but no loops, called acyclic or pipelined 
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slices. The equivalence checking of pipelined slices is based on the idea of looking 
at the output signals as functions of inputs indexed by the delays caused by the 
latches. This idea allows for the transformation of the problem into a combi- 
natorial problem. The values of an input for different delays are considered to 
be independent. In reality, they are consequent output values of some circuitry; 
this might cause a false mismatch report. BOVE uses a data structure for the 
representation of the outputs called Timed- Ternary BDDs; this kind of BDDs 
are unique to BOVE [12]. Despite the fact that Timed- Ternary BDDs do not 
provide a canonical representation, it has worked very well on our designs. The 
availability of equivalence checking for pipelined designs gives designers freedom 
to depart from simple RTL timing, and to perform the timing optimizations that 
are necessary to achieve the highest performance. 

The most general equivalence checking algorithm implemented in BOVE can 
handle slices with state holding devices and loops. Sequential systems of this 
type are modeled by FSMs, with input set {0, 1}” and output set {0, 1}. Storage 
elements that encode the state of a machine take values from the set {0, 1, V}, 
where X has the meaning of unknown. The extension of Boolean logic to ternary 
is implied by the extension of the AND and NOT operations: AND(A, 0) = 
AND(0,A) = 0, AND(A, 1) = AND(1,A) = A, NOT(A) = A, and storage 
devices propagate A. Computation with ternary logic is implemented using pairs 
of BDDs. 

Our FSMs are completely specified, i.e., both transition and output functions 
are defined for any combination of inputs and internal states. Unlike in the most 
commercial equivalence checkers, we do not require any correspondence between 
storage elements; designers are free to change the state encoding of the machines! 
A FSM is called resetable, if there is a sequence of inputs that brings the machine 
from the state where all storage elements are set to X to a known binary state. 
Resetability is out of the scope of BOVE, but it is assumed by BOVE, and we 
use COSMOS [6] to ensure that a FSM is resetable. 

Our goal is to prove that once two machines under consideration are success- 
fully initialized (reset), they will always give the same output given the same 
inputs. The state sets of the two machines are disjoint. BOVE builds the prod- 
uct machine of the specification and implementation machines as described in 
Section 2.3. The invariant we want to establish is that the output of the product 
machine is true on the care set specified by architect. To prove the invariant, 
BOVE uses both forward and backward traversal. However, in most cases, the 
set of initial states is not known; therefore, backward traversal is used more 
frequently. The FSM equivalence checking in BOVE [12] was extended by in- 
tegrating a traversal package developed at the University of Torino [37] which 
implements both forward and backward traversals. The tool is integrated as a 
separate engine that receives the BDD representations of the next-state func- 
tions from BOVE, performs an equivalence check, and in the case of an invariant 
violation, generates an initial state and a sequence of inputs that exhibit an in- 
valid state. Assuming resetability of both machines, if BOVE reports a match 
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for the product machine, the product machine can be reset to a valid state and 
there is no sequence of inputs that cause the machine to enter an invalid state. 

Theoretically, FSM-equi valence checking could satisfy all our verification 
needs. However, in reality, all three equivalence checking algorithms: for combi- 
natorial, pipelined, and FSM slices, play an important role in the BOVE compare 
procedure. The combinatorial algorithm is simple and BOVE has good heuristics 
for combinatorial slice discovery. Pipelined slices with a large number of latches 
would normally cause a state explosion of the FSM verification algorithm, but 
easily pass through BOVE’s pipelined verification algorithm. Finally, the FSM 
slice algorithm is critical for verifying slices with feedbacks and devices that 
generate Xs. 

3.3 Preconditions, Don’t Cares, and Precharge Mapping 

Because of the complexity of modern designs and the restricted capacity of our 
tools, validation is applied on designs piece by piece. This is a non-trivial task, 
because for verification purposes a design cannot be decomposed into completely 
independent parts. Even though two signals are equivalent in the context of the 
whole design, they may not be equivalent in the context of one schematic, unless 
we place restrictions on the range of input values for this schematic. BOVE 
allows users to specify preconditions that describe constraints on the values of 
inputs. A typical precondition is the mutual exclusion of some inputs, but any 
FSM may serve as a precondition. 

Architects writing RTF code describe required functionality, but at the same 
time, they leave designers some freedom for their implementation. This is ex- 
pressed as don’t care conditions. For instance, a signal may take any value if the 
clock is high or that some combination of inputs can never occur. The interpre- 
tation of don’t care conditions is different for slice inputs than for outputs. If 
there is a don’t care condition associated with a slice output, the output equiva- 
lence needs to be proved for the inputs that violate this condition only. In turn, 
if a don’t care condition associated with a slice input is satisfied, its mapping is 
disregarded; this assures soundness of BOVE results. 

Another feature that reflects the gap between specification and implemen- 
tation is an extension of the mapping of compare points to precharge mapping. 
Precharge mapping expresses a more complex relation between the static spec- 
ification and the dynamic implementation of signals; for instance, a schematic 
signal can be precharged high in one clock phase and equivalent to the RTF sig- 
nal in a subsequent phase. Based on our methodology, precharged mapping can 
be derived from the names of the signals. Fike equivalence mapping, precharge 
mapping is used as an assumption for slice inputs, but must be proved for slice 
outputs. 

3.4 Features for False Mismatch Elimination 

The main reason for false mismatches is the restriction of the context in which we 
attempt to prove the equivalence of signals. In the previous section, we described 
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how preconditions associated with the inputs of a piece of the design can help 
avoid false mismatches. In other cases, a slice that is created automatically by 
BOVE is too small to capture the signal equivalence (see Figure 3). We developed 
different heuristics for finding appropriate slice boundaries. Users have also some 
impact on slice expansion by temporarily dissociating internal compare points. 







Assume that a user specified following mappings: Ar = Aa, Br = Bs, Gr = Gs, 
Fr = Fa - Let us assume we are trying to prove the last equivalence. We create 
a slice for Fr and Fa by expansion of their fan-in to the first compare points 
that are defined by the mappings, i.e., Br, Ba, Gr, and Ga become the slice 
boundaries. Since, Fr = AND(Gr,Br) and Fa = XNOR(Gs, Bs), we will get 
a mismatch. The mismatch analysis will show that the mismatch occur when 
Gr = Ga = Br = Ba = 0. Lookiug deeper into the circuit, we can see that 
this case cannot ever happen, because {Br = 0) (Gr = 1) and similarly 

{Ba = 0) ^ (Gs = 1). 

Fig. 3. False mismatch resulting from an inadequate slice 



The frequency with which compare points appear in a design to be verified is 
very important as it influences the minimal size of a slice. The distance between 
compare points defines the smallest slice. In cases when the user does not provide 
enough compare points, BOVE can use its automatic mapping routine, that is 
based on pseudo-random simulation to find candidates for compare points. The 
accuracy of the routine depends on the number of pseudo-random test runs. 
The compare points found are not necessarily equivalent and it is up to the user 
to remove those points that cause mismatches. The mismatch report on those 
nodes cannot be ignored, because their equivalence is used as assumptions in 
subsequent equivalence proofs. 

In contrast to false mismatches resulting from slices that are too small, slices 
that are too large are hard to debug and sometimes even too hard to verify. 
In an effort to eliminate equivalence checks that fail due to the capacity of the 
tool, we implemented heuristics for variable orders that help keep BDD sizes 
manageable. 

Some of the false mismatches can be avoided by keeping track of equiva- 
lences that occur just within an RTL description or within a schematic. Since 
the equivalences defined by mappings between the RTL and implementation are 
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assumed to be valid for slice inputs, we can propagate all equivalence relations 
from specification to implementation and back using these mappings. The prop- 
agation of the slice input equivalences is demonstrated with the example shown 
in Table 1. The scheme can be more complicated if we allow more than one node 
to be mapped to several nodes and if precharge mapping is involved. 



Table 1. Example of slice input propagation 



Mapping 


RTL Equivalences 


Schematic Equivalences 


ml: Rl = SI 
m2: R2 = 32 
m3: R4 = 34 
m4: R5 = ^35 


rl : -nR2 = R3 = R4 
r2: R5 = -^R6 


si : SI = -iS2 
s2 : S3 = -iS4 = S5 



The first column describes the mapping between specification and implemen- 
tation. The second column contains equivalences within just the specifica- 
tion, and the third colnmn lists only the equivalences within implementation. 
T = B = C is an abbreviation of {A = B) A {B = C). Given this information, 
we are able to deduce following information: 

ml, si, m2imply R1 = -iR2 
m2, rl, mSimply S2 = -<S4 
m3, s2, mdimply R4 = R5 
m4, r2, mSimply S5 = S6 



Consequently, 

Rl = -nR2 = R3 = R4 = R5 = ~<R6 



and 



SI = -^32 = -iS3 = S4 = ^S5 = ^S6 



and we are able to prove relations like NAND(i?l, Y) = OR(S5, Y); this could 
not be done without slice input propagation. 



3.5 Debugging Features 

Simulators that are used to validate chip-level schematics are usually run on 
the big parts of a design, the full chip, or even the entire system. Tests are 
either created by large verification teams or they are passed along from previ- 
ous projects. We have both focused and random tests. However, these tests are 
practically useless for designers that work on their relatively small pieces of the 
design. There is no time in their tight schedule for designing their own test sets 
that would satisfactorily cover the schematics on which they work. BOVE takes 
this worry away from the designers by providing 100% coverage of the design. 
A designer needs to have a global understanding of the tool only, and is not 
required to understand the verification algorithms used. 
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Since BOVE is often used for debugging - it is meant to run quickly on small 
pieces, and to provide an indication of the problem if a mismatch occurs. The 
main features that helps a user to find a bug is mismatch analysis, which varies 
with the comparison algorithm used. In the case of the combinatorial algorithm, 
users receive a condensed table of all cases where an implementation differs from 
its specification. In the case of a pipelined slice, mismatches for the underlying 
combinatorial circuit are given. Mismatches of FSMs are a bit more complicated 
because they are presented with witnesses of the mismatches; such witnesses 
describe an initial state and a sequence of inputs that causes the machines to 
produce a different output. By using its built-in simulator, BOVE can create a 
visual representation of an error trace containing inputs, outputs, and signals 
identified by a user. The error trace is presented either as a waveform or as list 
of values for each time step. In addition to mismatch analysis, BOVE is able to 
detect simple errors, like a missing invertor, and can make suggestions of how to 
correct it. If the mismatch is a potential false mismatch due to an inappropriate 
choice of slice, BOVE reports the slice input misalignment. In any event, a user 
may always ask for slice analysis, that shows different slice statistics. 

3.6 Conclusion 

Although simulation remains the dominate functional verification method, the 
use of formal methods, and particularly equivalence checking is a critical part of 
the Alpha microprocessor development process. Our in-house equivalence checker 
BOVE has replaced simulation as a means to verify transistor-level schematics. 
As complexity increases, the performance advantage offered by BOVE is critical 
to our validation needs. Many of the nice features in BOVE were motivated 
by requests from users. Without the tight interaction between developers and 
designers, BOVE would not be what it is now. We see the use of other formal 
methods by the Alpha development team, but in a less systematic manner. 
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Abstract. Knowledge discovery, that is, to analyze a given massive data 
set and derive or discover some knowledge from it, has been becoming 
a quite important subject in several fields including computer science. 
Good softwares have been demanded for various knowledge discovery 
tasks. For such softwares, we often need to develop efficient algorithms 
for handling huge data sets. Random sampling is one of the important 
algorithmic methods for processing huge data sets. In this paper, we 
explain some random sampling techniques for speeding up learning al- 
gorithms and making them applicable to large data sets [15,16,4,3]. We 
also show some algorithms obtained by using these techniques. 



1 Introduction 

For knowledge discovery, or more specifically, for a certain kind of data mining 
task, it would be quite helpful if we can use learning algorithms on a very large 
data set. In this paper, I explain some random sampling techniques we have 
developed for scaling up learning algorithms. But before going into a technical 
discussion, let us see in general what is expected to us computer scientists in 
knowledge discovery, and locate our problem in there. 

For investigating various problems for knowledge discovery, we had in Japan 
a three-year research project “Discovery Science” (chair Setsuo Arikawa, Kyushu 
Univ.) from April 1998 to March 2001, sponsored by the Ministry of Education, 
Science, Sports and Culture. This is a quite large project involving many com- 
puter scientists and researchers in related fields, from philosophers and statis- 
ticians to scientists struggling with actual data. As a member of this project, I 
have learned many aspects of knowledge discovery. What I will explain below 
is my personal view on knowledge discovery that I have developed through my 
experience. Due to the space limit, I cannot explain examples in detail; I cannot 
even cite and list related papers either. Please refer to a coming book report- 
ing our achievements in the Discovery Science Project that will be published 
from Springer. (On the other hand, for those explained in the following sections, 
please refer [15,16,4,3] as their original sources.) 

* A part of this work is supported by the Ministry of Education, Science, Sports and 
Culture, Grant-in- Aid for Scientific Research on Priority Areas (Discovery Science), 
1998-2001. 
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The ultimate goal of computer science is to provide good computer softwares 
(sometimes, computer systems) that would help activities of human beings. This 
is also true for knowledge discovery. We want good softwares helping us to an- 
alyze “complex data” and discover some “useful rules” hidden in the data. But 
our approach may differ depending on the type of the “complexity” of data we 
are given and also the type of the “usefulness” of a rule we are aiming for, and 
roughly speaking, there are two basic approaches. 

The first approach is to develop (semi) automatic discovery systems. This is 
the case where a given data set consists of a large number of pieces of information 
of the same type and it is not required to obtain the very best rule explaining 
the data. 

For example, K. Yada etal used several data mining tools and analyzed a 
huge sales data of a big drugstore chain to find out a purchase pattern of a 
“loyal customer”, a customer who will become profitable to the stores. Here it 
may not be so important to obtain the best rule for this loyal customer prediction. 
As pointed out by Yada etal, a simple rule would be rather useful for business 
so long as it has a reasonable prediction power. An important and nontrivial 
task that they did was to design an efficient and robust software that converts 
actual purchase records to a data set applicable to data mining tools. In general, 
it is an interesting subject of software engineering to design such a converting 
software systematically. On the other hand, once we have a nice and clean data 
set from the actual data, we could use various learning algorithms. Note that 
these learning algorithms should work very efficiently and should be applicable 
to a very large data set. Here is the point where we can use random sampling 
techniques explained in the following sections. 

On the other hand, it is sometimes the case that the complexity of data 
is due to not only its volume. Or we cannot assume any similarity in data. A 
typical example is a DNA sequence. Suppose that we have a set of complete 
DNA sequences of some single person. Certainly, the amount of information is 
huge, but each piece of information does not have the same role. In such a case, 
we cannot hope for any automatic discovery system. Then we aim for developing 
discovery systems that work with human intervention. This the second approach. 

For example, R. Honda etal tried to develop a software that processes satellite 
images of lunar and finds out craters. They first tried a semi automatic software 
by using an unsupervised learning algorithm, Kohonen’s self organizing map. 
But it turned out that the level of noise differs a lot depending on images. That 
is, the data is complex and not of the same type. Thus they changed a strategy 
and aimed for a software that works in collaboration with a human expert; then 
a quite successful software was obtained. 

In the extreme case, the role of a software is to help human experts to make 
their discovery. This applies to the case where we want to find the very best rule 
from a limited number of examples. Note that even if the number of examples 
is limited, each example is complex, consisting of a large amount of data like 
the DNA sequence of one person. Thus, an efficient software that filters out 
and suggests important points would be quite useful. Y. Ohsawa introduced the 
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notion of “key graph” to state potential motivations or causes of some events. 
He developed a software that extracts a key graph from data and applied it for 
the history of earthquakes to predict some risky areas. H. Tsukimoto etal made 
a software that processes f-MRI brain images and finds out activated areas in 
brain. By using his algorithm, it is possible to point out some candidate areas 
of activation from a very fuzzy original image data. Learning algorithms are 
also useful to filter out unnecessary attributes. O. Maruyama, S. Miyano, etal 
proposed a software that, using some learning algorithm, not only filters out 
irrelevant attributes but also creates a “view” , a combination of attributes, that 
gives a specific way of understanding a given data. Visualization is also useful 
for human experts. 

So far, we have been considering software developments. But techniques or 
methods developed in computer science themselves could be also useful for sci- 
entific discovery. H. Imai etal used several data compression algorithms to study 
the redundancy or incompressibility of DNA sequences, which may lead, in fu- 
ture, some important scientific discovery on DNA sequences. S. Kurohashi etal 
used a digitalized Japanese dictionary to study the Japanese noun phrase “iVi 
no N 2 ' ■ Though “no” in Japanese roughly corresponds to “of” in English, its 
semantic role had not been clearly explained among Japanese language schol- 
ars. By computer aided search through some digitalized Japanese dictionary, S. 
Kurohashi etal discovered that the semantic role of “no” (of certain types) can 
be found in the definition statement of the noun N 2 in the dictionary. This may 
be a type of discovery that researchers other than computer scientists could not 
think of. 

Now as we have seen, there are various types of interesting subjects from 
knowledge discovery. But in the following, we will focus on the case where we 
need to handle a large number of examples of the same type, and explain some 
sampling techniques that would be useful for such a case. 



2 Sequential Sampling 

First let us consider a very simple task of estimating the proportion of instances 
with a certain property in a given data set. More specifically, we consider an 
input data set D containing a huge amount of instances and a Boolean function 
B defined on instances in D. Then our problem is to estimate the proportion of 
instances x in D such that B{x) = 1 holds. (Let us assume that D is a multiset; 
that is, D may contain the same object more than once.) Clearly, the value ps 
is obtained by counting the number of instances x in D such that B{x) = 1. But 
since we consider the situation where D is huge, it is impractical to go through 
all instances of D for computing ps- A natural strategy that we can take in 
such a situation is random sampling. That is, we pick up some elements of D 
randomly and estimate the average value of B on these selected examples. 

If we were asked the precise value of ps, then trivially we would have to 
go through the whole data set D. Here consider the situation where we only 
need to get ps “approximately”. That is, it is enough to compute pb within 
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a certain error bound required in each context. Also due to the “randomness 
nature”, we cannot always obtain a desired answer. Thus, we must be satisfied 
if our sampling algorithm yields a good approximation of pb with “reasonable 
probability”. We discuss this type of estimation problem. That is, we consider 
the following problem. 

Problem 1 . For given <5 > 0 and e, 0 < £ < 1, obtain an estimation ps of pb 
satisfying the following. (Here the probability is taken over the randomness of 
the sampling algorithm being used.) 

Fv[\pf}-pB\<e-pB] > 1 - 5 . ( 1 ) 

This problem can be solved by the following simple sampling algorithm: 
Choose n instances from D uniformly at random, count the number m of in- 
stances X with B{x) = 1, and output the ratio m/n as ps. Here sample size n is 
a key factor for sampling, and for determining appropriate sample size, so called 
concentration bounds or large deviation bounds have been used, (see, e.g., [19]). 
For example, the Chernoff bound gives the following bound for n. 

Theorem 1 . For any 5 > 0 and e, Q < e < 1, if the sample size n satisfies the 
following inequality, then the output of the simple sampling algorithm mentioned 
above satisfies (1). 




Notice, however, that this bound is hard to use in practice, because for es- 
timating an appropriate sample size n, we need to know pB, which is what we 
want to estimate! Even if we could guess some appropriate value p for pb, we 
need to use p such that p < pb, just to be safe. Then if we underestimate pb 
and use much smaller p, we have to use unnecessarily large sample size. 

This problem may be avoidable if we perform sampling in an on-line or 
sequential fashion. That is, a sampling algorithm obtains examples sequentially 
one by one, and it determines from these examples whether it has already seen 
enough number of examples. Intuitively, from the examples seen so far, we can 
more or less obtain some knowledge on the input data set, and it may be possible 
to estimate the parameters required by the statistical bounds. Thus, we do not 
fix sample size in advance. Instead sample size is determined adaptively. Then 
we can use more appropriate sample size for the current input data set. 

For Problem 1, such a sequential sampling algorithm has been proposed by 
Lipton etal [28,29], which is stated in Figure 1. (Here we present a slightly 
simplified version.) 

As we can see, the structure of the algorithm is simple. It runs until it sees 
more than A examples x with B{x) = 1. To complete the algorithm, we have 
to specify a way to determine A. For example, the Chernoff bound guarantees 
the following way. (In [28] the bound from the Central Limit Theorem is used, 
which gives a better sample size.) 
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program SampleAlg-1 
m 0; n <— 0; 

while m < A do 

get X uniformly at random from D; 
m m + B{x)\ n n + 1; 
output m/n as an approximation of ps', 

end. 



Fig. 1. Sequential Sampling Algorithm for Problem 1 



Theorem 2. In the algorithm of Figure 1, we compute A by 



A = 



3(1 + s') 



In 1 - 

0 



Then for any <5 > 0 and e, 0 < £ < 1, the output of the algorithm satisfies (1). 

Note that A does not depend on ps- Thus, we can execute the sampling 
algorithm without knowing ps- On the other hand, the following bound on 
the algorithm’s sample size is also provable. Comparing it with the bound of 
Theorem 1, we see that the number of required examples is almost the same as 
the situation where we knew ps in advance. 



Theorem 3. Consider the sample size n used by the algorithm of Figure 1. 
Then with probability > 1 — 5/2, we have 

sample size n < - — ^ — In 
(1 - e)e^pB 

While this is a typical example of sequential sampling, some sequential sam- 
pling algorithms have been developed recently (i) that can be applied to general 
tasks, and (ii) that have theoretical guarantees of correctness and performance 
[14,15,34]. Let us see here one application of such algorithms. 

In some situations, we would like to estimate not pb but some other value 
computed from p]/. In [15], a general algorithm for achieving such a task is 
proposed. Here, as one example, we consider the problem of estimating ub = 
Pb — ^/2. (This problem arises when implementing “boosting” techniques, which 
will be explained in the next section.) 

Problem 2. For given (5 > 0 and £, 0 < e < 1, obtain an estimation Ub of ub 
satisfying the following. (For simplifying our discussion, let us assume here that 
Ub > 0 .^ 

Pr[ |ft^ - Ms] < e • Ub ] > 1-A (2) 

Problem 2 is similar to Problem 1, but these two problems have different criti- 
cal points. That is, while Problem 1 gets harder when pb gets smaller, Problem 2 
gets harder when ub = Pb ~ 1/2 gets smaller. In other words, the closer pB is 
to 1/2, the more accurate estimation is necessary, and hence the more sample is 
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program SampleAlg-2 
m t— 0; n <— 0; 
u t— 0; a <— oo; 
while u < a(l + 1/e) do 

get X uniformly at random from D; 
m m + B{x)\ n <— n + 1; 
u <— m/n — 1/2; 
a y/ (l/2n) ln(n(n + l)/(5); 
output u as an approximation of us; 
end. 



Fig. 2. Sequential Adaptive Sampling Algorithm for Problem 2 

needed. Thus, Problem 2 should be regarded as a different problem, and a new 
sampling algorithm is necessary. 

For designing a sequential sampling algorithm for estimating ub-, one might 
want to modify the algorithm of Figure 1. For example, by replacing its while- 
condition “to < A” with “to — n/2 < B” and by choosing B appropriately, we 
may be able to satisfy the new approximation goal. But this naive approach does 
not work. Fortunately, however, we can deal with this problem by using a slightly 
more complicated stopping condition. In Figure 2, we state a sequential sampling 
algorithm for Problem 2. Note that the algorithm does not use any information 
on Ub', hence, we can use it without knowing at all. On the other hand, as 
shown in the following theorem, the algorithm estimates ub with the desired 
accuracy and confidence. Also the sample size is bounded, with high probability, 
by 0(l/(eMs)^ log(l/((5u_B)), which bound is, ignoring the log factor, the same 
as the situation where ub were known in advance. 

Theorem 4. For any i5 > 0 and £, 0 < e < 1, the output of the algorithm of 
Figure 2 satisfies (2). Furthermore, with probability more than 1 — i5, we have 

sample size n < In ( — — ) . 

(SUb) \£0Ub J 



Some Comments on Related Work 

Since the idea of “sequential sampling” or “sampling on-line” is quite natural 
and reasonable, it has been studied in various contexts. 

First of all, we should note that statisticians had already made significant 
accomplishments on sequential sampling during World War II [36]. In fact, from 
their activities, a research area on sequential sampling — sequential analysis — 
had been formed in statistics. For example, performing random sampling until 
the number of “positive observations” exceeds a certain limit, has been studied 
in depth in statistics. For recent studies on sequential analysis, see, e.g., [24]. 

In computer science, adaptive sampling techniques have been studied in the 
database community. The algorithm of Figure 1 was proposed [28,29] for esti- 
mating query size in relational databases. Later Haas and Swami [26] proposed 
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an algorithm that performs better than this in some situations. More recently, 
Lynch [30] gave a rigorous analysis to the algorithms proposed in [28,29]. 

One can see the spirit of adaptive sampling, i.e., to use instances observed 
so far for reducing a current and future computational task, in some algorithms 
proposed in the computational learning theory and machine learning community. 
For example, the Hoeffding race proposed by Maron and Moore [31] attempts 
to reduce a search space by removing candidates that are determined hopeless 
from the instances seen so far. A more general sequential local search has been 
proposed by Greiner [25]. More recently, sequential sampling algorithms have 
been developed for general data mining tasks and analyzed their performance 
both theoretically and experimentally; see, e.g., [15,34]. 

In the study of randomized algorithms, some sequential sampling algorithms 
have been also proposed and analyzed [12]. 

3 An Application of Sequential Sampling to Boosting 

Problem 2 discussed in the previous section arises when we want to design a 
simple learner based on random sampling. Such a simple learner may be too 
weak for making a reliable hypothesis. But it can be used as a weak learner in 
a boosting algorithm. (Here by a learner we mean an algorithm that produces 
a mechanism or a rule — hypothesis — for making a yes/no classification on a 
given example. Let us again D to denote a given data set, which contains a huge 
number of instances, each of which expresses some object called example in this 
paper. Note that each example has a yes/no classification label. That is, we are 
given the way to classify examples given in D, and our goal is to find a good 
hypothesis explaining these classifications.) 

A boosting algorithm is a way to design a strong learner yielding a reliable 
hypothesis by using a weak learner. Almost all boosting algorithms follow some 
common outline. A boosting algorithm runs a weak learner several times, say 
T times, under distributions /xi, that are slightly modified from the given 

distribution /i on D and collects weak hypotheses hi,..., hr- A final hypothesis 
is built by combining these weak hypotheses. Here the key idea is to put more 
weight, when making a new weak hypothesis, to “problematic instances” for 
which the previous weak hypotheses perform poorly. That is, at the point when 
hi, ...,ht have been obtained, the boosting algorithm computes a new distribu- 
tion nt+i that puts more weight on those instances that have been misclassified 
by most oi hi, ..., hf. Then a new hypothesis ht+i produced by a weak learner on 
this distribution Ht+i should be strong on those problematic instances, thereby 
improving the performance of the combined hypothesis built from hi, ..., ht+i. 

Boosting algorithms differ typically on a weighting scheme used to compute 
modified distributions. In [23] an elegant weighting scheme was introduced by 
Freund and Schapire, by which they defined a boosting algorithm called Ad- 
aBoost. It was proved that AdaBoost has a nice “adaptive” property; that is, it 
is adaptive to the quality of an obtained weak hypothesis so that the boosting 
process can converge quickly when better weak hypotheses are obtained. Fur- 
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thermore, many experimental results have shown that AdaBoost is indeed useful 
for designing a good learner for sevarl practical applications. 

Now we would like to use AdaBoost to design a classification algorithm that 
can handle a huge data set. For this, we have to solve two algorithmic problems: 
(i) how to design a weak learner, and (ii) how to implement the boosting process. 

Consider the first problem. AdaBoost does not specify its weak learner. Of 
course, a better learner yields a better hypothesis, which speeds up the boosting 
process. But a heavy weak learner is not appropriate for handing a large data 
set; for example, C4.5, a well-known decision tree making software, would be too 
costly to be used as a weak learner if it were used directly on the original huge 
data set D. Here random sampling can be used to reduce the data set size. For 
our discussion, let us take the following simplest approach: We consider a simple 
weak hypothesis so that the set T~L of all weak hypotheses remains relatively 
small. Then search exhaustively for a hypothesis h G % that performs well on 
randomly sampled instances from D. For implementing this approach as a weak 
learning algorithm. Problem 2 (and its variation) becomes important. 

Let us consider our simple approach in more detail. AdaBoost works with 
any weak learner except for one requirement. That is, for AdaBoost (and in 
fact any boosting algorithm) to work, it is necessary that a weak learner should 
(almost) always provide a hypothesis that is better than the random guess. Let 
cor^(h) be the probability that a hypothesis h classifies correctly on instance 
of D that are generated randomly according to distribution /x. Then adv^(h) = 
cor^(h) — 1/2 is called the advantage of the hypothesis h. Note that the correct 
probability of the random guess is 1/2; hence, the advantage of h shows how 
much h is better than the random guess. In order for AdaBoost to work, it is 
required that a weak learner should yield a hypothesis ht (at each tth boosting 
step) whose advantage adv^j(Zit) is positive; of course, the larger advantage is 
the better. Then the task of a weak learner is to search for a hypothesis with 
(nearly) largest accuracy. We are considering a weak learner that uses random 
sampling to estimate adv^^ (h) for each h G "H (and select the one with the best 
estimated advantage). This estimation is indeed our Problem 2. To guarantee 
that a selected hypothesis h G H is nearly best (with a high probability), the 
sample size must be determined in terms of the best advantage 7; the more 
instances are needed if 7 is close to 0 (in other words, the correct probability of 
the best hypothesis is close to 1/2). That is, what is focused is the advantage 
of each hypothesis and not the correct probability. Hence, the problem needed 
to solve is Problem 2 and not Problem 1. Furthermore, the best advantage is 
not known in advance. Thus, this is the situation where sequential sampling is 
needed, and the algorithm of Figure 2 satisfies the purpose. 

From Theorem 4, we can prove that the weak learner designed by using 
our sequential sampling algorithm needs to see roughly 0(|'H|/7^) instances 
(ignoring some logarithmic factor) and runs within time proportional to this 
sample size. Thus, this weak learner runs with reasonable speed if both \H\ and 
1 /7 are bounded within some range; in particular, an important point is that 
its running time does not depend on the size of the data set D. For example, we 
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conducted some experiments [17] by using the set of decision stumps, one node 
decision trees, for the class TL. Our experiments show that the obtained weak 
learner runs in reasonable amount of time in most cases, independent from data 
set size. 

Next consider the second problem, i.e., an implementation of boosting pro- 
cess. In the boosting process (of AdaBoost and any other boosting algorithms), 
it is necessary to generate training examples under distributions modified from 
the original distribution^. Thus, some implementation of this example genera- 
tion procedure is necessary. So far, we have two implementation frameworks:- 
(i) boosting by sub-sampling and boosting by filtering [20]. 

In the sub-sampling framework, before starting the boosting process, a set S 
of a certain number of instances are chosen first from D, and only elements in S 
are used during the boosting process. More specifically, at each boosting step t, 
the modified distribution ptt at this step is defined on S, and the table of proba- 
bilities p,t{x) for all a; G S' is calculated. Then we run a weak learning algorithm, 
supplying examples chosen from S according to their probabilities. (In many 
situations, it is not necessary to use any learning algorithm. We can simply find 
a weak hypothesis h that is best on S under pit by calculating the correct prob- 
ability cor^j(/i) = where S{h) = {x € S : his correct on a;}.) 

In the filtering framework, on the other hand, we run a weak learning algorithm 
directly on the original data set D. Whenever an example is requested by a weak 
learner, it is generated from D by using a “filtering procedure” that yields an 
instance of D under the modified distribution jit- 

In the sub-sampling framework, the whole set S is used throughout the boost- 
ing process; that is, the sample size for a weak learner is fixed. Thus, the adap- 
tivity of our proposed weak learner loses its meaning. Also in many practical 
situations, it may not be easy to determine the size of the training set S in 
advance. On the other hand, the filtering framework fits very well to our weak 
learner, because the weak learner can control the sample size. Unfortunately, 
however, the filtering framework cannot be used in AdaBoost; under its weight- 
ing scheme, the weight of some examples may get exponentially large, and the 
filtering procedure cannot produce examples within a reasonable amount of time. 

In [16] we introduced a new weighting scheme, which is obtained from the one 
used in AdaBoost by simply limiting the weight of each example by 1. Under this 
weighting scheme, it is possible to follow the filtering framework. This boosting 
algorithm — MadaBoost — implemented in the filtering framework is stated in 
Figure 3. Some explanation may be helpful for understanding some of the state- 
ments. First look at the main program of MadaBoost. The statement “(/it. It) ^ 
WeakLearn(FiltEx(t))” means that (i) some weak learning algorithm is executed 
by supplying examples by using FiltEx(t), and then (ii) a hypothesis ht with ad- 
vantage 7 t (= adv^j {ht}) is obtained. For example, our simple weak learner based 
on the sequential sampling algorithm can be used here, which also gives a rea- 



^ Since we are assuming that the given data set D is huge, we may be able to assume 
that instances there already reflect the underlying distribution; thus, we may regard 
the uniform distribution on D as the original distribution. 
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program MadaBoost 

for t t— 1 to (X) (until FiltEx terminates the iteration) do 
WeakLearn(FiltEx(t)) 
pt ^ Vl-27t/(l + 27t); 

end-for 

output the following ft'. 

ft{x) = argmaXy^Y log — 

i:hi(x) — y 

end. 

procedure FiltEx(t) 
loop-forever 

generate x uniformly at random from D; 

% cons{hi, x) = 1 if hi{x) is correct and —1 otherwise, 
with probability wt{x), output x and exit; 
if if of iterations exceeds some limit then 

exit and terminate the iteration of MadaBoost; 

end-loop 

end. 



Fig. 3. MadaBoost (in the Filtering Framework) 



sonable estimation of the advantage 74 . Using this advantage 74 , the parameter 
Pt is defined next, which is used for defining the weight (i.e., probability) of each 
example, as well as for determining the importance of the obtained hypothesis 
ht- A function ft defined at the termination of the main iteration is the final 
hypothesis; intuitively, this function makes a prediction for a given example x by 
taking the weighted majority vote of the classifications made by weak hypothe- 
ses hi, MadaBoost is exactly the same as AdaBoost up to this point. The 
difference is the way to compute the weight Wt{x) for each example x G D, which 
is defined and used in the procedure FiltEX. When FiltEX(t) is called, it draws 
examples x randomly from D and filter them according to their current weights 
Wt{x). In AdaBoost, the weight Wt{x) is computed as 

simply limit it by 1. Then it is much easier to get one example. Furthermore, if 
it gets hard to generate an example, then the boosting process (and the whole 
algorithm) can be terminated. This is because it is provable (under the new 
weighting scheme) that the obtained hypothesis is accurate enough when it gets 
hard to generate an example by this filtering procedure. 

In [16] it is proved that the weighting scheme of MadaBoost still guarantees 
polynomial-time convergence. Unfortunately, MadaBoost’s convergence speed 
that can be proved in [16] is exponentially slower than AdaBoost. However, 
our experiments [17] show that there is no significant difference on the conver- 
gence speed between AdaBoost and MadaBoost; that is, more or less the same 
number of boosting steps is sufficient. On the other hand, our experiments show 
that MadaBoost with our weak learner performs quite well on large data sets. 
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Some Comments on Related Work 

The first boosting algorithm has been introduced by Schapire [33] , for investigat- 
ing a rather theoretical question asking the equivalence between the strong and 
weak PAC-learnability notions. But due to the success of AdaBoost for practical 
applications, several boosting techniques have been proposed since AdaBoost; 
see, e.g., the Proceedings of the 13th Annual Conf. on Computational Learning 
Theory (COLT’OO). 

Heuristics similar to boosting or related boosting have been also proposed 
and investigated experimentally; see, e.g., [7,13,1]. 

It has been said that boosting does not work well when error or noise exists. 
That is, the case when some of the instances in D have a wrong classification 
label. In particular, this problem seems serious in AdaBoost. As we mentioned 
above, the weighting scheme of AdaBoost changes weights rapidly, which makes 
AdaBoost too sensitive to erroneous examples. Again this problem is reduced by 
using a more moderate weighting scheme like MadaBoost. In fact, it is shown 
[16] that MadaBoost is robust to a standard statistical errors, errors that could 
be corrected by seeing sufficient number of instances. (Recall that D may be 
a multiset and one example may appear in D more than once. Then by seeing 
enough number of instances of D for the same example, we may be able to fix it 
even if some instance is labeled wrongly. See, e.g., [27] for the formal argument.) 
For solving this problem of AdaBoost, Freund [21] introduced a new weighting 
scheme and proposed a new boosting algorithm — BrownBoost — - which is 
also robust to the statistical errors of the above type. Unfortunately, however, 
even these algorithms may not be robust against errors that are not fixed by 
just looking many instances. One typical example is the case where the data set 
contains some exceptions, examples that should be treated as exceptions. 

4 Random Sampling for Snpport Vector Machines 

Boosting provide us with a powerful methodology for designing better learning 
algorithms. Unfortunately, however, boosting algorithms may not work suffi- 
ciently if a given data set contains many errors or exceptions. On the other 
hand, there is a popular classification mechanism — support vector machine (in 
short SVM) — that works well even if the data set contains some “outliers”, i.e., 
erroneous examples or exceptions. Here again we show that random sampling 
can be used for designing an efficient SVM training algorithm. {Remark. One 
important point of the SVM approach is so called the “kernel method” that 
enables us to obtain hyperplane separation in a much higher dimension space 
than the original space. In this paper, this point is omitted, and we consider 
a simple hyperplane separation. See our original paper [3] for an extension to 
the kernel method. See also [4] that proposes another algorithm derived from a 
similar approach.) 

First we explain basics concerning SVM. Since we only explain those neces- 
sary for our discussion, see, e.g., a good textbook [11] for more systematic and 
comprehensive explanation. A support vector machine (discussed here) is a clas- 
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sification mechanism that classifies examples by using a hyperplane. This time 
our data set H is a set of m instances in some n dimension space. Here again 
we consider the binary classification, and we assume that instances in D has a 
binary label +1 for postive and —1 for negative examples. Then SVM training 
is to compute a hyperplane separating positive and negative examples with the 
largest margin. Precisely, our goal is to solve the following optimization prob- 
lem^. (Let us assume for a while that D has no erroneous instances and positive 
and negative examples can be separated by some hyperplane in the n dimension 
space. To simplify our discussion, we assume here that D is an ordinary set. 
That is, an example appears at most once as an instance in D; hence, there is 
no distinction between example and instance. Also we assume that examples in 
D are indexed as X\,X 2 ^ ..., x^, where each Xi has a label yt.) 

Max Margin (PI) 

min. -||tc|p — (0+ — 9-) w.r.t. w = (wi, ..., w„), 0+, and 9-, 
s.t. w ■ Xi > 9+ if Pi = 1, and w ■ Xi < 9- if j/i = — 1. 

This optimization problem is a quadratic programming (QP in short) prob- 
lem. Though QP is in general polynomial-time solvable, existing (general) QP 
solvers are not efficient enough to solve a large QP problems. Here random sam- 
pling can be used. In fact, some random sampling techniques have been developed 
and used for such combinatorial optimization problems (see, e.g., [18]), and we 
simply use one of them. This technique yields a training algorithm that works 
particularly well when the dimensionality n (i.e., the number of attributes) is 
moderate but the size m of D is huge. 

The idea of the random sampling technique is in fact similar to boosting. 
Pick up a certain number of examples from D and solve (PI) under the set 
of constraints corresponding to these examples. We choose examples randomly 
according to their “weights”, where initially all examples are given the same 
weight. Clearly, the obtained local solution is, in general, not the global solu- 
tion, and it does not satisfy some constraints; in other words, some examples are 
misclassified by the local solution. Then double the “weight” of such misclas- 
sified examples, and then pick up some examples again randomly according to 
their weights. If we iterate this process several rounds, the weight of “important 
examples”, examples defining the solution hyperplane (which are called support 
vectors), would get increased, and hence, they are likely to be chosen. Note that 
once all support vectors are chosen at some round, then the local solution of this 
round is the real one, and the algorithm terminates at this point. 

More specifically, we can in this way design a SVM training algorithm stated 
in Figure 4. Let us see the algorithm a bit more in detail. In the algorithm, w{x) 
denotes the current weight of an example x. Examples in D are drawn randomly 
from D so that the probability that each example x is chosen is proportional to 
its current weight w(x). Let i? be the set of examples chosen in this way, and 

^ Here we follow [5] and use their formulation. But the above problem can be restated 
by using a single threshold parameter as given originally in [10] . 
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program RandomSamplingSolver 
set w(x) 1 for each x £ D] 
r 65^; 

repeat 

R ■£- choose r examples x from D randomly according to their weights w{x); 
Solve (PI) for R and let be its solution; 

V <— the set of violators to {w* , 0’^,9’L); 
if w(V) < w{D)/3S then 

double the weight w(x) of x £V; 
until V — 0', 

output the current solution; 

end. 



Fig. 4. SVM Training Algorithm Based on Random Sampling 



we solve (PI) on R to obtain a local solution {w*, 0)j., 0*_)- A violator is simply 
an example that is misclassified by the obtained local solution. The parameter 
S denotes a combinatorial dimension, a quantity expressing the complexity of 
a combinatorial optimization problem to be solved; for the problem (PI), we 
have 5 < n + 1, which is from the fact that n + 1 support vectors are enough 
to define the solution hyperplane. Note that the number of examples chosen for 
R is computed from n and 5 (= n + 1) and independent from m, the size of D. 
Thus, our algorithm is appropriate if m » n, i.e., the number of examples is 
much larger than the number of attributes. 

Note that the weight of violators get increased only when their total weight 
is at most one third of the total weight, that is, not so many violators exist. 
But we can prove that this situation does not occur often by using “Sampling 
Lemma” [8,18]. Then the following bound follows from this fact. 

Theorem 5. The average number of iterations executed in the algorithm of Fig- 
ure 4 is bounded by 6i51nm. 

Recall that S < n 1; thus, this theorem shows that the algorithm needs to 
solve (PI) roughly O(nlnm) times. On the other hand, the size of R and hence 
the time needed to solve (PI) at each iteration is bounded by some polynomial 
in n. 

Next consider the case that a given data set D contains “outliers”, and no 
hyperplane can separate positive/negative examples given in D. In the SVM 
approach, this situation can be handled by considering “soft margin”, that is, 
by relaxing constraints with slack variables. Precisely, we consider the following 
generalization of (PI). 

Max Soft Margin (P2) 

min. - (6»+ + 

I 

w.r.t. w = (wi, ...,Wn), 9+, 0-, and ^i, ...,£,ra 
s.t. w ■ Xi > e+ - if j/i = 1, lu • £Cj < 6»_ + if yi = -1, 
and > 0. 
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Here variables > 0 are newly introduced, which express the penalty of Xi 
being misclassified by a hyperplane. That is, examples can be misclassified by 
a solution in this formulation, but the penalty Xi is added to the cost if Xi is 
misclassified. Intuitively, both the margin from the separating hyperplane and 
the cost of misclassifications are taken into account in this problem (P2). At this 
point, we can formally define our notion of “outliers”. An outlier is an example 
Xi that is misclassified (in other words, with ^i > 0) by the solution of (P2). 
Note here that we use a parameter K < I for adjusting the degree of influence 
from misclassified examples, and that the solution of (P2) and the set of outliers 
depend on the choice of K. 

Now our problem is to solve (P2). We assume that the parameter K is ap- 
propriately chosen [6]. We may still use the algorithm of Figure 4. The point we 
need to consider is only the choice of S, the combinatorial dimension of (P2). 
Since (P2) is defined with n -I- m -I- 2 variables, <5 is bounded by n -I- m -I- 1 by the 
straightforward generalization of the argument for (PI). But this bound is too 
large; in fact, the number of examples required for R becomes much larger than 
the original number m of examples! Fortunately, however, we can show [3] that 
S is indeed bounded by n -I- -I- 1, where £ is the number of outliers. (The proof 
uses the geometrical interpretation of (P2) given by Bennett and Bredensteiner 
[5].) Then by using the bound of Theorem 5, we can conclude the following: the 
algorithm needs to solve (P2) roughly 0{{n+£) Inm) times, and the time needed 
to solve (P2) at each time is bounded by some polynomial in n -I- 

Comments on Related Work 

The present form of SVM was first proposed by Cortes and Vapnik [10] . Many al- 
gorithms and implementation techniques have been developed for training SVMs 
efficiently; see, e.g., [35,6]. This is because solving QP for training SVMs is costly. 
Among speed-up techniques, those called “subset selection” [35] have been used 
as effective heuristics from the early stage of the SVM research. Roughly speak- 
ing, a subset selection is to divide the original QP problem into small pieces, 
thereby reducing the size of each QP problem. Well known subset selection tech- 
niques are chunking, decomposition, and sequential minimal optimization (SMO 
in short). In particular, SMO has become popular because it outperforms the 
others in several experiments. (See, e.g., [10,32,11] for the detail.) Though the 
performance of these subset selection techniques has been extensively examined, 
no theoretical guarantee has been given on the efficiency of algorithms based 
on these techniques. On the other hand, we can theoretically guarantee the ef- 
ficiency of our algorithm based on random sampling. Compared with existing 
heuristics, our algorithm may not be efficient as it, but it can be used with 
the other heuristics to obtain an yet faster algorithm. Furthermore, using our 
weighting scheme, it may be possible to identify some of the outliers in earlier 
stages, thereby making the problem easier to solve by removing them. 

The idea of using random sampling to design efficient randomized algorithms 
was first introduced by Clarkson [8]. This approach has been used for solving 
various combinatorial optimization problems. Indeed a similar idea has been used 
[2] to design an efficient randomized algorithm for QP. More recently, Gartner 
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and Welzl [18] introduced a general framework for discussing such randomized 
sampling techniques. Our algorithm of Figurre 4 and its analysis for (PI) are 
immediate from their general argument. 
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Abstract. We show that the interactive knapsack heuristic optimization 
problem is APX-hard. Moreover, we discuss the relationship between 
the interactive knapsack heuristic optimization problem and some other 
knapsack problems. 



1 Introduction 

The interactive knapsack (IK) model is a generalization of the model behind 
the generalized assignment problem. An instance of the interactive knapsack 
problem contains several knapsacks connected somehow together: inserting an 
item in one of the knapsacks affects also to the other knapsacks (in a way to 
be described below) . Most reasonable problems using the IK model are strongly 
NP-complete. One possible approach to avoid computational difficulties is to 
restrict the problems to use one item at a time leading to the interactive knapsack 
heuristic optimization (IKHO) problems. Unfortunately, some IKHO problems 
are NP-complete, too. (See [1].) 

In the IK model we have an ordered group of knapsacks, also called a knap- 
sack array. When inserting an item j into a knapsack i, it gets copied into 
knapsacks i -I- 1, . . . , z -b c (c > 0). The inserted item and its copies together are 
called a clone. Moreover, an item can also “radiate”, i.e. some portions of the 
item can be copied into knapsacks i — u, . . . ,i — l and z-|-c-|-l, . . . , i+c+u {u > 0). 
This behavior (clone and radiation) can model, for example, the controls made 
in the electricity management, like load clipping (see [1,2,3]). 

We suppose a familiarity with the basics of complexity theory as given, e.g. 
in [8,12]. 

We are able to prove that IKHO is APX-hard. Moreover, we establish a con- 
nection between IKHO and the multi-dimensional 0-1 knapsack problem. This 
suggests the existence of new applications of IKHO in query optimization and 
in related areas, where the multi-dimensional 0-1 knapsack problem is applied 
[5]. 

At the end of this section we formulate IKHO handling one item at a time. 
We allow the same item to be inserted several times into the knapsack array 

* Author is grateful to Erkki Makinen for his time and comments. 



L. Pacholski and P. Ruzicka (Eds.): SOFSEM 2001, LNCS 2234, pp. 152-159, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




On the Approximability of Interactive Knapsack Problems 153 



but not into the same knapsack. Similarly, in load clipping, we can usually make 
several controls with the same utility. Other possible applications are described 

in [11- 

In the 0-1 knapsack problem [11] we want to maximize such that 

We have one knapsack, whose capacity is & (6 > 0). Profits p and 
weights w are n- vectors of positive integers and a; is a binary vector (x € {0, 1}”). 
The 0-1 knapsack problem is NP-complete but it has a fully polynomial time 
approximation scheme (see [12], pp. 305-307). 

The generalized assignment problem (GAP) [11] maximizes XllHi XlJ=i Pij^ij 
subject to < hi [i = l,...,m), ^ 1 0 = l>--->'n), and 

Xij = 0 or 1, (z = 1, . . . , m, j = 1, . . . , n). In GAP the profit and weight of an 
item depend on the knapsack. Notice that all values Pij,Wij and bt are positive 
integers. Martello and Toth [11] call this problem LEGAP; in their terminology, 
GAP is supposed to have condition = 1 instead of the inequality used 

above. 

GAP is APX-hard and it can be 2-approximated. The APX-hardness is shown 
for a very restricted case of GAP [6] . This restricted special case of GAP is used 
in Sections 2 and 3. 

The multi-dimensional knapsack problem (MDKP) [4] is similar to GAP. 
However, this time the selection of an item means that we add an amount to every 
knapsack. Hence, we select only items (and not the knapsack at the same time, as 
in GAP). In MDKP we are to maximize ^'j^iPjXj subject to 
(z = 1, . . . , m), and Xj = 0 or 1 (z = 1, . . . , m, j = 1, . . . , n). Lin [10] gives a 
survey on some well-known non-standard knapsack problems (including MDKP). 

MDKP has a PTAS, if m is fixed [7]. Ghekuri and Khanna [5] show that 
MDKP is hard to approximate within a factor of for every fixed 

B = mini hi- Srinivasan [13] shows how to obtain in polynomial time 
solutions, where t = Q{z* and z* is the optimal solution. We obtain 

better approximations as B (the smallest knapsack) increases. 

MDKP is closely connected to some integer programming problems. The 
packing integer program (PIP) [5] coincides with MDKP [5], but we could call it 
the positive max 0-1 integer program as well, because all Wij, pj and hi consist 
of positive elements. 

In Section 2 we present an approximation preserving transformation between 
GAP and IKHO. Moreover, we show that IKHO equals MDKP. In Section 3 we 
characterize the transformation between GAP and MDKP. 

In IKHO we have Xi £ {0, 1}, profits pi £ N, weights Wi £ N, lengths of clones 
c or Ci G N, lengths of radiations u or Ui £ N, and functions A : Z — >■ Q, for 
knapsacks z = 1, . . . , to. Moreover, we have a capacity hi £ N, for each knapsack 
z, and a positive integer K. Interaction A of knapsack z maps the distance from 
z to a rational number. By using c (clone) and u (radiation) we mean that A 
equals one, for knapsacks z, . . . , z-l-c, and is arbitrary for knapsacks z — zz, . . . , z — 1 
and i + c + 1, . . . ,i + c + u. For all other knapsacks, /j is zero. 
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(Fixed) IKHO problem is to 



m i-\-c-\-u 

max X! 


(1) 


i—1 k—i—u 




TYl 

subject to ''^^Wili(l)xi < bi, 


(2) 


a;fc = 0, for z < fc < z -I- c, when Xi = 1, 


(3) 


a;^ = 0 or 1, and 


(4) 


m 

y^xt<K, 


(5) 



i=l 



where I = 1, . . . ,m in (2) and i = 1, . . . , m in (3) and (4). 

If AT =0(1) or M = 0, the above problem is solvable in polynomial time 
[1]. When K = 0{m) and w = m, it becomes NP-complete [1]. This can be 
seen by a transformation from the 0-1 knapsack problem. Thus, the questions 
about the existence of PTAS and FPTAS algorithms for IKHO are relevant. The 
NP-completeness of IKHO is an open question, when u = 0(log(m)). 

We see easily that the knapsack problem build up by the radiation uses at 
most u different profits and weights. Thus, if u = 0{m) the problem remains 
NP-complete (m/c must be 0{m), of course). As noted, other values for u may 
or may not imply polynomial time algorithms. 



2 Transformation from GAP into IKHO 

We reduce a highly restricted version of GAP to IKHO. As shown in [6], GAP 
is APX-hard even on the instances of the following form, for all positive S: 

— pij = 1, for all j and i, 

— Wij = 1 or Wij = 1 -I- 5, for all j and i, and 

— bi = 3, for all i. 

In what follows, this problem is referred to as the restricted GAP. 

The APX-hardness of the restricted GAP means that there exists e > 0 such 
that it is NP-hard to decide whether an instance has an assignment of 3m items 
or if each assignment has value at most 3m(l — e) (see [6]). In the sequel, we 
suppose that u = m. 

Theorem 1. IKHO is APX-hard. 

Proof. Given an instance of the restricted GAP (having m knapsacks and n 
items), we create an instance of IKHO as follows. 

If Xij = I in GAP, then xb. = I in IKHO. In IKHO we have knapsacks 
for each item j, and knapsacks b'i,...,b'^ similar to knapsacks in 

GAP. 
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Interaction I is defined in such a way that knapsacks ji, . . . , ji, . . . ,jm will 
be full when item ji is chosen (i.e. when we set x'^. = 1). Hence, item j can be 
put at most once into the IKHO array. Moreover, interaction for knapsack 6' is 
such that the weight Wij of GAP will be put into knapsack Other knapsacks 
are left intact. We need only radiation (i.e. c = 0) defined as: 

{ 1 + 2 ;, when k = b'i, 

1, when k G [ji,+_i] U [ji+i,jm], 

0, otherwise, 

where z = 0, if Wij = 1, and z = 6, li Wij = 1 + <5. 

The capacity of knapsacks b'^, . . . ,b'^ is 3, while the capacity of knapsacks 
jij ■ ■ ■ ijm is 1, for each item j. Profit is l/(m + 1), for each nm items, because 
an item with interaction involves m + 1 knapsacks. Hence, the profit given by 
radiation is l/(m + 1) and it equals 1, for each item. We set Wj^ = 1, for 

each item and knapsacks li, . . . , Um- The weight is 4, for knapsacks b{, ... ,b'^, 
i.e. we cannot set a;;,' = 1, for any i. 

Three items can be put into a knapsack i in GAP if and only if three items 
can be fitted into a knapsack &' through interaction. If GAP has a full assignment 
of value 3m, the corresponding IKHO instance has the same value and knapsacks 
b'i are full. Otherwise, in GAP the value is at most 3m(l — e) as well as in IKHO. 
Each item in GAP can be assigned only once and IKHO behaves similarly. □ 

Hence, a PTAS for IKHO would contradict the fact that the restricted GAP 
is APX-hard. Figure 1 shows the knapsack structure used in Theorem 1. It 
corresponds to a situation where we set Xn = 1 in GAP, i.e. we put the first item 
into the first knapsack. In IKHO we set x\^ = 1 and radiation takes care that 
knapsacks I 2 , ■ • . , Im are full so that the first item of GAP cannot be put into 
any other knapsack any more. Moreover, to handle the normal GAP knapsack 
restriction, the interaction puts to the first corresponding knapsack b\ in IKHO 
the same amount than in GAP. In this example, weight Wn in GAP is 1 + i5. 







Fig. 1. The knapsack structure used in Theorem 1. 



Theorem 1 considers the special case of IKHO where c = 0, u = m and 
K = m. We can also reformulate IKHO so that it equals a special case of 
MDKP. Thus, we can use the algorithms designed for MDKP [7,13] to solve this 
special case of IKHO. 
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First, define p' = Ii{k)pk {i = 1, . . . , m). This changes our objective 

function (1) to be Xti Second, define = Wili{k), for i,k = 1, . . . , m. 

As a result, condition (2) turns to be XXi ^ bk {k = 1, . . . ,m). Again, 
an item in knapsack i means that we add a weight for each knapsack k = 

Now conditions (3) and (5) are both fulfilled. We have converted our problem 
into 

m 

max (6) 

i^l 

m 

subject to E w'ii,Xi<bk, k=l,...,m. (7) 

i=l 

Xj = 0 or 1, i = 1, . . . ,m. (8) 

Condition (7) is equal to Wx < b, where kF is an m x m matrix with positive 
elements. The above problem is a special case of MDKP containing m knapsacks 
and m items (as opposite to n items). 

As pointed out, MDKP has a PTAS for fixed number of knapsacks but is 
otherwise hard to approximate. Even though IKHO seems to be easier than 
MDKP with unfixed number of knapsacks. Theorem 1 rules out the possibility 
of having a PTAS for IKHO. Moreover, Chekuri and Khanna [5] show also that 
MDKP is hard to approximate even when m = poly(n) (recall that in our case 
m = n). 

The cases of IKHO with some fixed c > 0 are at least as hard as the case with 
c = 0. The number of items is not any more m, but rather mjic + 1). However, 
restriction (3) is non-linear. 

Since some of the algorithms for MDKP are based on linear programming 
relaxation [13] and as the efficiency depends on the number of knapsacks (to the 
number of linear restrictions), restriction (3) should be efficiently transformed 
into a linear form. Because c > 0 is fixed, we can do it as follows. 

By restricting Xi + Xj < 1 we can ensure that at most one of Xi or Xj equals 
one. The m restrictions of form (3) are transformed into cm restrictions of the 
form 

Xk + Xk+i < 1, where i = 1, . . . , c and fc = 1, . . . , m. (9) 

If Xfc = 1, then the following c knapsacks Xi {i = k + 1, . . . , k + c) fulfill the 
condition = 0. This does not change the hardness of approximating IKHO by 
using MDKP. 

Hence, we have shown 

Theorem 2. For IKHO, we can obtain solutions of value where 

t = z* is the optimal solution, and B is the size of the smallest 

knapsack. 

Theorem 2 holds also for the case, where the clone length depends on the 
knapsack into which we put an item. Thus, in the problem setting (l)-(5) we 
can change every c to be Cj. 
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Actually, we can reduce MDKP to IKHO as well. Let W be the integer weight 
matrix and p' integer profit vector of MDKP. Define wi = 1 and Ii(k) = 

Now, we have to find profits pi such that p' = h{k)pk- We can set 

p = W'~^p' , if we use a matrix pseudo-inverse (see, e.g. [9]). Calculating pseudo- 
inverse (up to some predefined accuracy) takes polynomial time. Elements Pi are 
rational numbers but the sum h{k)pk is near to p' and can be rounded 

to an integer. Finally, set K = m, c = 0 and u = m. 

When = 1 in MDKP, Xi = 1 in IKHO. Profits will be the same (after ob- 
vious rounding) and knapsack restrictions are handled equally in both problems. 
Thus IKHO is no easier than MDKP. 

Theorem 3. Unless NP=ZPP, IKHO is hard to approximate to within a factor 
of jixed e > 0, where B = min^ hi is fixed. 

Proof. See the discussion above and the results of [5]. □ 

While Theorem 2 gives us methods to solve IKHO within a known precision in 
polynomial time. Theorem 3 sharpens the result of Theorem 1. After Theorems 
1 and 3 it is natural to discuss the nature of transformation from GAP into 
MDKP. 



3 Transformation from GAP into MDKP 

A direct transformation from the restricted GAP into MDKP is also possible by 
using a similar construction than the one used in the proof of Theorem 1. By 
using 

X = (xii ••• Xi^ ••• x„i ••• x„„)^ 

(T stands for the matrix transpose) and by adding 




(j-l)m m ones 



for each item j in W , where W stands for the weight matrix, we can ensure that 
only one of the values x^y, . . . ,Xj^ can be one. Ones occur in those positions 
in b, which correspond to the rows W described above. GAP gives other rows 
directly: 

0 • • • 0 Wii 0 • • • 0 Wi2 0 • • • 0 Win 0 • • • 0 , 

for each knapsack i. Notice that the first non-zero values take place at positions i 
mod (m+ 1) and the next (rest) non-zeros occur after m— 1 zeros. The size of W 
is (to + n) X nm. Vectors p and b are defined in the obvious manner. Again, the 
profit gap between an instance of the restricted GAP and MDKP is the same. 
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To summarize, we have 



Wll 


0 




0 


Wl2 


0 






0 • 


• 0 


Win 


0 •• 


0 


0 


W21 


0 




0 


W22 


0 




0 • 


• 0 


0 


W2n ■ ■ 


0 


0 




0 


Wml 


0 




0 


Wm2 


0 • 


• 0 


0 


... 0 


'^mn 


1 




1 


1 


0 




0 


0 


0 • 


• 0 


0 




0 


0 




0 


0 


1 




1 


1 


0 • 


• 0 


0 




0 


_0 




0 


0 


0 




0 


0 


0 • 


• 0 


1 


1 •• 


1 



and 




n 



where = 3 (z = 1, . . . , m) in the case of the restricted GAP, and 
P = {Ph ■■■ Pim ■■■ Pni ■■■ PnJi^, 

where pj = 1 (z = li, . . . ,rZm) for the restricted GAP. (Hence, MDKP is APX- 
hard in the general case, as already known.) 

If we had a PTAS for MDKP, we could use it to solve the restricted GAP. 
Moreover, as the above restriction works also for the general GAP (with positive 
values only), the PTAS could be used also for GAP. As noted, PTAS [7] applies 
only to the cases, where m is fixed. In our transformation an m x n matrix of 
an instance of GAP is transformed into an (m + n) x mn matrix of an instance 
of MDKP. The transformation of GAP into MDKP introduces new knapsacks 
for each item, and hence, the number of knapsacks in MDKP will not be fixed. 
Further, we can conclude the following theorem. 

Theorem 4. Unless N=NP, there cannot he an approximation preserving trans- 
formation of the restricted GAP into MDKP, in which the difference between the 
numbers of knapsacks is at most 0(1). 

Otherwise, we could use the PTAS of MDKP with fixed number of knapsacks 
directly as a PTAS for the restricted GAP, which in turn, is APX-hard. 

4 Conclusion 

We have shown that fixed IKHO is APX-hard and equal to MDKP so that the 
properties of MDKP apply to IKHO. Further, we characterized the relationship 
between GAP and MDKP. 

A version of IKHO having variable clone lengths is obtained from (l)-(5) by 
changing c to be c^. We also have to modify restrictions: replace restriction (4) 
with 

f 1, when Ci > 0, 

x=< ' 

* (0, otherwise. 




On the Approximability of Interactive Knapsack Problems 159 



and add restrictions 

Ci> —1, Ci integer. 

In addition, the interaction function / takes Ci to its second argument, i.e. we 
use Ii{k,Ci). Applications similar to load clipping typically give ranges for the 
admitted values of Ci, for each i. 

The variable clone length IKHO is harder than the one with fixed clone 
lengths. The transformation given in Section 2 does not work anymore, because 
the decision of the length of a clone settles also the number of restrictions (9). We 
will work on the open problem of having a reasonable approximation algorithm 
for IKHO with variable clone length Ci . It is also open whether the variable clone 
length IKHO is harder to approximate than fixed IKHO. 
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Abstract. Brand and Zafiropulo [1] introduced the model of communi- 
cating finite-state machines to represent a distributed system connected 
with FIFO channels. Several different communication protocols can be 
specified with this simple model. In this paper we address the prob- 
lem of automatically validating protocols by verifying properties such 
as well-formedness and absence of deadlock. Our method is based on a 
representation of communicating finite-state machines in terms of logic 
programs. This leads to efficient verification algorithms based on the 
ground and non-ground semantics of logic programming. 



1 Introduction 

Formal methods of specification and analysis are being gradually introduced to 
handle the increasing complexity of communication protocols. In particular, in 
the last years many efforts have been spent in order to identify properties in- 
dependent from the functionality of a specific protocol that can be effectively 
verified and that can help a designer to discover uncorrect behaviors of the speci- 
fication. For this purposes, Brand and Zafiropulo [1] have introduced the model of 
communicating finite-state machines (CFSMs) to represent a distributed system 
connected with FIFO channels. Such communication model can be described by 
using queues. Many communication protocols can be specified within this sim- 
ple model. In the same paper, they also defined a number of minimal properties 
that well-formed protocol are expected to satisfy. Our aim is to develop tools 
for computer-aided validation, based on algorithms that can detect violations 
of such minimal properties. Our methodology is based on a representation of 
CFSMs in logic programming. This allows us to apply a number of efficient and 
theoretically well-founded techniques that have previously been developed for 
the analysis of logic programs. In particular, as we will discuss in the paper, it is 
possible to use the non-ground semantics of logic programs in order to implicitly 
represent set of states. According this, we have built a prototype for a validation 
tool based on the fixpoint operators of logic programs. This tool (implemented 
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in SICStus Prolog) is an extension of a prototype we used for the verification of 
infinite-states concurrent programs without explicit communication [3]. In that 
work we used constraints to represent the evolution of the system variables. 
Here we will use operations on channels (queues) as “constraints” to represent 
the executions of the protocols. As we will discuss in the present paper, we have 
used our prototype to verify well-formedness of protocols and to detect potential 
deadlocks. Our analysis is performed by assuming a given bound on the length 
of the communication channels. 

Though, in this paper, we are mainly concerned with reachability analysis 
(backward) , in an extended version of this paper we will consider model-checking 
for Protocol Validation Logic (PVL), a logic which has sufficient expressive power 
to express the well-formedness properties of protocols. 

Based on such experience, in the conclusions we will discuss some ideas con- 
cerning a possible application of constraints logic programming for the validation 
of infinite-state communication models (e.g. systems with unbounded channels 
or systems with arbitrary messages). 

The paper is organized as follows. We first recall the main definition intro- 
duced in [1]. We then set the formal grounds for using logic programming as 
a validation model for protocols. This connection allows us to reduce the ver- 
ification problem to reasoning about SLD-derivations. Finally, we discuss the 
verification of a sample protocol. The conclusion provides directions for future 
work (e.g. validation of communicating infinite-state machines) and comparisons 
of our method with some of the other approaches in literature. 

Related works. There exist other works relating logic programming and verifica- 
tion of protocols. In [5], Fribourg and Peixoto propose the model of Concurrent 
Constraint Automata (CCA) to specify and prove properties of (synchronous) 
protocols with unbounded number of data. CCA are represented as logic pro- 
grams with arithmetic constraints. Differently from [5] we use constraints that 
represent operations over queues and we study reachability and safety properties 
of the considered protocols. In [7], XSB a logic programming language based on 
tabling is used to implement an efficient local model checker for a finite-state 
CCS-like value-passing language. Differently from [7] in this paper we focus on 
the relationships between the nonground semantics of logic programs encoding 
protocols and well-formedness of protocols. In [2], pushdown automata are rep- 
resented by logic programs and their properties are discussed using set-based 
costraints techniques. Communication primitives are not considered in such an 
approach. In [1], communicating finite state machines were used for modelling 
protocols. The method relied on building a tree for each process, representing 
all its possible executions, and on studying the interrelationships between the 
processes. Their approach cannot be fully automated and hence, is not compara- 
ble to our method. Another approach is based on the PROMELA [6] validation 
model in which one can directly specify processes, message channels and state 
variables. Protocols written in PROMELA can be verified using the validator of 
SPIN [6]. The SPIN validator performs a forward reachability analysis: it can 
either do an exhaustive state space search, or can be used in supertrace mode 
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USER SERVER 




Fig. 1. Protocol prot. 



[6], with a hit state space technique. SPIN explicitly represents the states of the 
protocol. In contrast, sets of protocol states are implicitly represented as non- 
ground atoms in our system. In various settings for model-checking, the implicit 
representation (e.g., via BDD s) has been crucial for feasibility. 

2 Communicating Finite-State Machines 

Communicating finite-state machines [1] have been introduced to represent a 
network of communicating processes. Each process is represented by a finite-state 
automaton. Each pair of communicating processes is assumed to be connected 
by a full-duplex, error free FIFO channel, i.e., communication is achieved by 
using a pair of queues for each pair of processes. Several different communication 
protocols can be represented within this formalism, as we will show in the rest 
of this section. 

Protocols. Informally, a protocol specifies the possible evolution of the consid- 
ered processes upon sending and receiving messages. To illustrate, consider the 
example in Figure 1. The process USER can pass from state READY to state 
WAIT sending a request (written !REQ) to process SERVER. The server, in 
turn, can enter in state SERVICE upon receipt of a request (written ?REQ), 
and so on. To simulate the full-duplex channels, there exist a pair of queues 
connecting the two processes. More formally, we have the following definition of 
protocols. 

Definition 1 (Protocol [1]). A protocol is a quadruple 
where 

— N is a positive integer. 

~ ^ disjoint finite sets (Si represents the set of states for process 

i). 

— Oi € Si represents the initial state of process i. 
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— are iV^ disjoint finite sets with Mu empty for alii (Mij represents 
the messages that can he sent from process i to process j ). 

— succ is a family of partial function for each i and j, s.t. 

• succ{si, x) = sfi X € Mij, means that process i can move from state Si to 
state s'i sending a message x to process j (written as !x when the indexes 
are clear from the context). The message x is enqueued in the queue of 
pending messages from process i to process j. 

• succ{si,x) = s'i, X € Mji, means that process i can move from state Si 
to state s'i receiving a message x from process j (written as lx). The 
message x must be the top element of the queue containing the pending 
messages from process j to process i. 

As an example, in the Figure 1, st6cc(READY, IREQ) = WAIT, 
succ (IDLE, IREQ) = SERVICE, and so on. 

A global state is a pair {S, C) where S' is a A-tuple (si, S 2 > ■ • ■ , sn) G JItv 
and C is a A^-tuple (cn, . . . , ciat, . . . ,cnn), each is a sequence of messages 
from Mij. The message sequence Cij represents the content of the channel from 
process i to process j. Let (S°, C°) = {{oi))Li, (where e is the empty 

sequence of messages) be the initial state of the protocol. The possible executions 
are described by the reflexive and transitive closure of the relation — > which is 
defined as follows: {S, C) — 1 {S' , C') iff there exists i, k and Xik such that one 
of the following two conditions hold. 

— All the elements of (S', C) and (S',C") are equal except s' = succ{si,\x) for 
X G Mjfc and c'j, = CikXik. 

— All the elements of (S, C) and (S', C') are equal except sj, = succ{sk, lx) for 
X G Mjfc and dk = 

No assumptions are being made on the time a process spends in a state before 
sending a message and on the time a message spends in a queue before it is being 
delivered to its destination. 

As anticipated in the introduction, in the following section we will reformulate 
the operational behaviour of CFSMs in terms of logic programs. 

3 A Simulation in Logic Programming 

The key point is to encode global states as atomic formulas. The successor func- 
tions of the CFSMs are then represented by definite clauses describing the state 
transitions and the modification of the queues depending on the type of commu- 
nication. Our specification consists of two parts: a logic program Pqueue which 
specifies the communication model, and a logic program Pprot which encodes the 
“successor” functions. 

Communication model. Our aim is to consider the queue operations as special 
constraints added to the description of the transitions of the CFSMs. Let Pqueue 
be a logic program specifying the basic operations on a FIFO channel (queue): 
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— push{x, (5, Q')' Q' is the queue that results by enqueuing x in Q; 

— pop{x, Q, Q'): Q' is the queue that results by dequeuing x from Q] 

— empty (Q): Q is the empty queue. 

Here we are not interested in a particular implementation of the three predicates 
above. We just require the implementation to be non-directional, i.e., all the 
arguments can be input as well as output arguments. 

Encoding of protocols. As discussed in [2,3] a transition rule s ^ s' can be 
represented by a binary logic programs of the form p(s) •<— p(s') where p is 
nothing but a tuple-constructor and s,s' represent the states s and s'. We can 
employ a similar idea to encode CFSMs, however, we will express communication 
by an additional invocation of a predicate in Pqueue- 

Formally, given a global state (S,C) s.t. S = {si, S 2 , . ■ . , s^) and C = 
(cii, Ci 2 , . . . ,cnn), we encode it as the atomic formula: 

p(si, S2, • ■ • , SAf, • ■ • , Cll, Ci2, . . . , Cnn) 

where p is a fixed predicate symbol. Consider a protocol prot (as in Def. 1) 
{{Si)’Li, (Oi)di, succ), 

the associated logic program Pprot is defined by the following set of clauses: 

p(^S\ , . . . , Sj , . . . , , . . . , Q M N ) ^ 

p{Si, . . . , s', . . . , QC, . . . , QNN),push{x, 
whenever succ(si, !x) = s', a: G Mij; 
p(^S\ , . . . , Sj , . . . , j , . . . , Q NN ) ^ 

p(5'i, . . . , s', . . . , QC, . . . , Qnn),Pop{x, Qji, QC) 
whenever succ{si,lx) = s', x & Mji, 
init •«— 

p(oi, . . . , o„, Qii, . . . , Qnn), empty (Qii), ..., empty(QNN) 

where capitalized identifiers represent free variables. As an example, the protocol 
of Figure 1 is encoded as the logic program of Figure 2. Now, given a ground 
SLD-derivation a = Go, Gi, . . . s.t. Go init, let us define cr* as the sequence 



init <— p{ready,Qu,idle,Q.s), empty [Qu), empty {Qs). 

p{ready, S,Qu,Qs) ^ p{wait, S,Qu,Qsl), push{req,Qs,Qsl). 

p{register, S, Qu, Qs) p{ready, S, Qu, Qsl), push(ack, Qs, Qsl). 

p{wait,S,Qu,Qs) p{ready,S,Qul,Qs), pop{done,Qu,Qul). 

p{ready, S, Qu, Qs) p{register, S, Qul, Qs), pop[alarm, Qu, Qul). 

p{U,idle,Qu,Qs) ^ p{U,. service, Qu, Qsl), pop(req,Qs,Q. si). 

p{U, fault, Qu,Qs) p{U, idle, Qu, Qsl), pop{ack,Qs,Qsl). 

p{U, service, Qu,Qs) p{U, idle, Qul, Qs), push{done,Qu,Qul). 

p{U,idle,Qu,Qs) p{U, fault, Qul, Qs), push(alarm,Qu,Qul). 



Fig. 2. The program Pprot. 
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^ 1 : • ■ • of atoms of the form p(t) obtained from a by getting rid of the inter- 
mediate SLD-derivation steps over Pqueue, i-e., by considering only the literals 
produced by resolution steps over the program Pprot- Then, the following result 
holds. 

Theorem 1. Let prot be a protocol and let Pprot be the encoding of prot pre- 
sented above. Then, for each ground derivation a in Pprot cr* is a computation 
in prot, and, vice versa, given a computation a in prot there exists a derivation 
p in Pprot s.t. p* = a. 

This result allows us to relate the fixpoint semantics (i.e the least model) of a 
program with interesting properties of the encoded protocols. 

4 Fixpoint Semantics and Verification 

Let us recall some definitions of the fixpoint semantics of logic programs. Given 
a program P, let us denote by Tp the immmediate consequences operator asso- 
ciated to P defined over a collection I of ground atoms as follows: 

W) = 

{p{d) € B I p{d) bi, . . . ,bn is a ground instance of P, 
bi € I for i : n > 0}. 

Let |P] be the least fixpoint of Tp (i.e. the least model of P). 

Now let us assume that the two programs Pprot and Pqueue are defined over 
the same language (i.e. they are defined over the same constants and function 
symbols). As we we will explain later, we are mainly interested in the following 
problem: 

decide whether or not a given state p(t) is reachable from the initial state 
init. 

This problem is undecidable for protocols with unbounded channels, whereas it 
becomes decidable whenever considering bounded channels only. We can char- 
acterize such a property by using the fixpoint semantics of Pprot- 

Theorem 2. A given state p(t) is reachable from the initial state init if and 
only if 

init G {Pprot U {Pqueue] U {p(t)}]. 

The intuition is that we first compute the least model of the program Pqueue 
in order to derive all the possible successful invocations of the communication 
primitives. Then, we use the obtained (potentially infinite) set of facts as an 
“oracle” to compute the least fixpoint of the program Pprot- 

Such characterization can be made effective by considering the non-ground 
fixpoint semantics operator [4], defined over a collection / of non-ground atoms. 
Such an operator is defined as follows: 

Sp{I) = {p(s)0| p{s) bi, . . . , is a variant of r £ P, 

ai £ I for i : 1, . . . ,n, n > 0 which share no variables, 

9 = mpw((ai, . . . ,a„), (6i, . . . , 6„))}- 




166 



Pablo Argon et al. 



Using non-ground interpretations allows us to implicitly represent collection 
of states. Now, let us call |P]s the least non-ground model of P (i.e. the least 
fixpoint of Sp). The following result holds: the set of ground instances of |P]s 
is equal to |P]. Thus, it is possible to reason both in terms of the ground and of 
the non-ground semantics. The non-ground characterization of the reachability 
problem becomes now: 



eue is U {j3(t)}]s. 

Actually, since the least non-ground model |P]s captures the answer- 
substitutions of P, such a fixpoint computation corresponds to a mixed backward 
and forward analysis of the program Pqueue U Pprot U {p{t)}- we compute the fix- 
point of the operator by executing the auxiliary predicates pop, push and 

empty in a top-down manner. In other words, in our prototype we use the fol- 
lowing modified definition of Sp, where we use Q to denote an invocation of pop, 
push, or empty: 

Sf{I) = {p{s)a I p{s) p{i), Q is a variant of r £ P, 
p\t') £ I, 6 = mgu{p{i),p{i')) 

Q9 has a successful derivation with answer a}. 

We prove the correctnss of this definition w.r.t. Sp in appendix A. The queue 
operations can be seen as a “special” form of constraints, and Pqueue as the 
definition of an incomplete constraint solver for them. 



5 Application to the Validation of Protocols 

In the following we will discuss some important general properties of protocols 
[1]. Let us first introduce some terminology. 

Well- formed protocols and stable tuples. A reception is a pair (s, x), where s is a 
state s and x is a message. A reception (s, x) is specified iff succ{s, lx) is defined. 
A reception (s, x) is executable iff there exists a reachable global state {S, C) 
where, for some i and k, s = Sk and Cik is of the form xY for some sequence 
Y. A protocol is said to be well-formed whenever each possible reception is 
executable if and only if it is specified. A protocol has a reception missing if a 
message x can arrive at a process when it is in state s, but the protocol does not 
specify which state should be entered upon receiving x in state s. A N-stable 
tuple is a global state in which all the channels are empty (i.e. there are no 
pending messages). Identifying non-specified/executable receptions and stable 
tuples can only be done for some classes of protocols. In particular, this can be 
done for protocols with bounded channels: we say that a channel from process i 
to process j is bounded by a constant h iff for every reachable global state {S, C), 
Cij is a sequence of length at most h. A protocol is said to be bounded iff all its 
channels are bounded. If no such constant h exists the channel is unbounded. 
For a given protocol, the bound on its channels can be estimated with a method 
described in [1]. In the following we will restrict ourselves to consider bounded 
channels protocols. 
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Previous works in analysis of protocols [8] show that it is important for a 
protocol to be well-formed independently of its functionalities. Furthermore, it 
is important to identify stable fV-tuples as they are useful for detecting synchro- 
nization losses. 

Validation of protocols. Protocol validation [9] is aimed at finding any violation 
of the requirements of the specification. Our validation procedure is based on 
the logic programming model of CFSM we introduced in the previous sections. 

As an example, let us consider again the protocol in Figure 1 and its encoding 
in logic programming in Figure 2. For this protocol it is possible to estimate an 
upper bound on the length of the queues during the executions. Thus, in the 
following discussion we limit ourselves to channels whose length is at most two. 

First of all we would like to check whether prot is well- formed or not. Let us 
define a new program Pdummy obtained from Pprot by adding a dummy transition 
for each unspecified reception (e.g., we add the clause p{ready, S, Qu, Qs) <— 
dummy, pop{ack,Qu,Qul)). If dummy is reached from init, i.e., 

init Cz \^Piummy tJ U { dr/TTlTTiy J , 

then, there exists an executable reception which is not specified. Running the 
example on our prototype we have found two counterexamples which correspond 
to the following transitions: 

p{wait, S, Qu, Qs) <— dummy, pop{alarm, Qu, Qul) . 
p{U, fault, Qu, Qs) <— dummy, popfreq, Qs, Qsl). 

Thus, the considered protocol is not well-formed (there exist(s) executable recep- 
tion(s) which are not specified). Obviously, this result extends to the unbounded 
case. Let us remark an interesting aspect in the use of the non-ground semantics 
to verify the above property. Since we assume that the definition of pop is nondi- 
rectional the invocation oipop{ack, Qu, Qul) with Qu and Qul as free variables 
returns the more general pairs of queues {q, ql) such that ack is the top element 
of q and gl is the result of a pop over q: in case we use lists of length at most 
two: Qu = [ack], Qul = [] and Qu = [_, ack], Qul = [_] where _ is a variable. 

Now, let us modify the protocol of Figure 1 as shown in Figure 3, i.e., by 
adding the following two clauses to the program in Figure 2: 

p{wait, S, Qu, Qs) p{wait, S, Qul, Qs),pop[alarm, Qu, Qul). 
p{U, fault, Qu, Qs) p{U, fault, Qu, Qsl),pop{req, Qs, Qsl). 

i.e. we add loops to consume messages which cannot be received in the state wait 
and fault. To verify that the new protocol is well-formed we can use the following 
method. Let us define a new program obtained from Pprot by turning one 

of the clauses which specify a reception succ{s, 7x) = s' into a dummy transition 
such that succ{s,lx) = dummy. For instance, we define a program Pf^^rnmy 
turning the clause 



p{wait, S, Qu, Qs) ^ p{ready, S, Qul, Qs),pop{done, Qu, Qul) 
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Fig. 3. Modified Protocol. 



into the clause 

p{wait, S, Qu, Qs) t— dummy, pop{done, Qu, Qul) . 

Now, if dummy is reached from init, i.e., 

init £ IPLrrimy U \Pqu^u4 U {dummy}}, 

it follows that the considered specified reception is executable. We have auto- 
matically verified such condition for each specified reception, thus proving that 
the new version is well-formed. 

Let us now study reachability of stable tuples for the noew protocol. Following 
our encoding, a stable tuple can be represented by the following clause, say 
STABLE: 



p{si, S 2 ,Qu,Qs) : — empty {Qu), empty (Qs) 

where Si G {ready, register, wait}, S 2 G {fault, service, idle}. Such a tuple is 
reachable iff the following condition holds: 

init G {Pprot u {Pqueuej U {stable}]. 

Executing the fix point computation shows that for the following pairs of states 
the corresponding stable tuple is reachable: {wait, fault), {register, fault), 
{wait, service), {ready, idle). 

Note that the first tuple corresponds to a deadlock state: the two processes 
can only remain in states wait and fault for ever, because no other messages 
can arrive. As a consequence, though well-formed, the considered protocol has a 
potential deadlocked execution. 

6 Conclusions 

The example in the previous section has already been treated by other verifica- 
tion methods [1]. 
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The novelty of our approach is in the implicit representation of of set of 
reachable states of the protocol by non-ground atoms. This may be useful to 
reduce the state-space explosion. Furthermore, the encoding of protocols in logic 
programs makes the specification modular and flexible. For instance, we can 
easily modify the definition of Pqueue in order to specify a different bound and 
a different type of communication. As a comparison, we have run the example 
on SPIN [6], detecting the deadlock state in 0.1s. We had the same execution 
time with our Sicstus Prolog prototype based on backward analysis. However, 
in SPIN we cannot submit assertions about queues (for instance to detect stable 
tuples) as we did in our prototype for validating the protocol. The above model 
can be extended in order to capture communicating infinite-state systems as we 
will briefly discuss hereinafter. 

Channels with unbounded values. In previous work [3] we have successfully ap- 
plied constraint logic programming to the verification of infinite-state concurrent 
systems with shared memory. Constraints are useful as symbolic representation 
of possibly infinite set of states of the computation. The same idea can be ap- 
plied to the case of communicating machines if we extend the model of CFSMs 
and admit arbitrary messages to be exchanged by two processes. For instance, 
we could write a rule like: 

p{P,idle,Qu,Qs) <— A > 0, p{P, service, Qu,Q si), pop{req{X),Qs,Qsl). 

to force the server to change state only upon receiving a positive integer. The 
combination of constraints, non-ground semantics, and use of forward/backward 
analysis might be useful to make the automatic analsyis of such complex systems 
effective. 

Finally, let us give some more hints about potential optimizations of the 
validation procedure. 

State explosion problem. An approach based on the exploration of the execution 
space of a protocol is likely to suffer from the state explosion problem even in 
case of bounded protocols. In case of protocol with very large queues one pos- 
sible solution is the definition of an encoding of operations on queues in terms 
of constraints (e.g constraints over integer variables based on a positional rep- 
resentation of the queues elements). This is still argument of research since the 
encoding we have found requires non-linear constraints which are very expensive 
to handle. For large protocols an alternative testing method is the random walk 
method [10]. The advantage of this method is that it has minimal space require- 
ments: to simulate a step of random walk, we need to store only the current 
state and a description of the actions. As a future work, we would like to incor- 
porate the possibility of random partial state exploration in our tool, whenever 
exhaustive search fails due to state explosion. 

Model Checking for Protocol Validation Logic. As stated in the introduction, in 
the extended version of this paper, we are going to introduce a Protocol Valida- 
tion Logic (PVL), which is sufficiently expressive to model the well-formedness 
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properties of a protocol. This logic is based on the Modal Mu-Calculus and al- 
lows us to do local model-checking. The advantage is that, the state space is 
built on-the-fly and that the whole state space need not be explored. We defer 
the presentation of the logic due to lack of space. 
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A Proofs 

Given P = PprotC Pqueue, let / be a set of non-ground facts such that |J^ue«e]s C 
I. We want to show that the definition of Sp given in Section 2, is correct w.r.t. 
Sp, i.e., Sp{I) = Sp{I). Let us apply the definition of Sp: given a variant 
a 1— 6, <7 of a rule in P {q defined over the predicates in Pqueue) and given 
c,dGl (which share no variables in common) such that d € |.^«euels> G 
Sp{I) whenever there exists a s.t. cr = mgu{{b, q), (c, d)). This implies that a = 
mgu{qa' , da') where cr' = mgu{b, c). Since, by hypothesis, var{d) fl var{b, q, c) = 
0, it follows that da' = d and a = mgu{qa',d). Thus, by a property of the 
S-semantics [4], ct is a computed answer substitution for Pqueue U {f— qa'}, i.e., 
aa G Sp{I). The other direction can be proved in a similar way. 
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Abstract. The class of weak parallel machines Cweak is interesting be- 
cause it contains some realistic parallel machine models, especially snit- 
able for pipelined compntations. We prove that a modification of the 
bulk synchronous parallel (BSP) machine model, called decomposable 
BSP (dBSP), belongs to the class of weak parallel machines if its com- 
putational power is restricted properly. We will also correct some earlier 
results about pipelined parallel Turing machines, regarding their mem- 
bership in the class Cweak- 



1 Introduction 

The bulk synchronous parallel (BSP) model, introduced by Valiant [10], is an 
example of the so-called bridging models of parallel computers. A good bridging 
model should allow portable parallel algorithms to be developed easily and also 
implemented on the really existing computers efficiently. The BSP model fulfils 
these requirements - BSP algorithms have been designed and analyzed [4, 5, 7, 8], 
and BSP implementations exist [3,6]. Hence, the BSP model is a realistic model 
of parallel computation. The decomposable BSP model (dBSP) is an extension of 
the BSP model [1]. It enables exploitation of communication locality of parallel 
algorithms in order to achieve an additional speedup. 

Models of computation can be assigned into machine classes according to 
their computational power. The first machine class [9] contains the Turing ma- 
chine (TM) and other sequential models, e.g., RAM. Given an algorithm, its 
time complexity differs only up to a polynomial factor when executed on dif- 
ferent models from the first class. The second machine class [12] is the class of 
massively parallel computers, e.g., PRAM. The time complexity of a problem on 
computers from the second class is polynomially related to the space complex- 
ity of the Turing machine. Machines from the second class offer an exponential 
speedup in comparison with the first class but, unfortunately, they are infeasible 
if physical laws of nature are taken into account. Hence we would like to have a 
class of parallel computers computing faster than members of the first machine 
class, yet being realistic, i.e., physically feasible. One such class is the class of 
weak parallel machines Cweak> defined by Wiedermann in [13]. Members of class 
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Cweak provide fast pipelined computation, i.e., fast solution of many instances of 
the same problem. A typical feature of weak parallel models is restricted commu- 
nication. For example, in a single step of a parallel Turing machine, information 
can be transferred from a tape cell to its immediate neighbour only. In dBSP, 
any communication pattern is allowed, but local communication (limited to small 
clusters of processors) runs faster and is therefore preferred. This characteristic 
- communication locality - is exhibited by many really existing parallel com- 
puters. This observation supports the hypothesis that weak parallelism of Cweak 
members could be a good characterization of realistic parallel computers. 

In this paper, we will answer the question whether dBSP computers belong to 
the class of weak parallel machines. In Sect. 2, we will define the class Cweak and 
present a representative weak parallel machine - the pipelined parallel Turing 
machine (PPTM). We will correct some claims from [13] about the relation of 
PPTMs and class Cweak- Section 3 contains an analysis of the membership of the 
dBSP model in the class of weak parallel machines. Due to space constraints, 
some details are omitted in this paper, but can be found in the author’s doctoral 
thesis, available as [2]. 

2 Class of Weak Parallel Machines 

In this section, we will define the class of weak parallel machines and study 
pipelined Turing machines. This provides a base for a definition and analysis of 
pipelined dBSP computers in the following section. 

A pipelined computation processes a sequence of N inputs (instances) of the 
same problem. We will usually assume that all inputs are of the same size n. 
Time complexity of a pipelined computation T{n) is the time needed to process 
one input, i.e., the time between the first symbol of input i is read and the 
output of problem instance i is produced. Space complexity is the total memory 
consumed by all instances. Period is the time between beginnings of reading 
two subsequent inputs or (equivalently) between ends of writing two subsequent 
outputs. All three complexity measures are defined as depending only on size n 
of a single input, not on the number of inputs N . Efficient pipelined computers 
belong to the class of weak parallel machines [13]. Its definition resembles the 
definition of the second machine class [12] of parallel computers. 

Definition 1. The class of weak parallel machines Cweak contains machines with 
their periods P{n) polynomially equivalent to the space complexity S{n) of a 
deterministic sequential Turing machine solving the same problem, i.e., P{n) = 
0{S^{n)) and S{n) = 0{P\n)) for some positive constants k and 1. 

A (non-pipelined) 1-dimensional 1-tape parallel Turing Machine (PTM) [13] 
is based on a nondeterministic sequential Turing machine (NTM) . The computa- 
tion starts with only one processor (the head with the control unit realizing the 
transition relation is called the processor) . If the NTM would do a nondetermin- 
istic choice of performing one of instructions i\,i 2 , . . ■ ,ib, the PTM creates 6—1 
new processors and each of 6 processors executes a different instruction from 
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the alternatives. Then all the processors work independently, but they share the 
same tape. The processors run synchronously, performing one step per time unit. 
It is forbidden for several processors to write simultaneously different symbols 
into the same tape cell. 

A pipelined parallel Turing machine (PPTM), defined in [13], has, in ad- 
dition to the working tape, a two-dimensional read-only input tape, and one- 
dimensional write-only output tape. The i-th input word Wi is written from the 
beginning of the z-th row of the input tape. The machine prints its z-th output 
to the z-th cell of the output tape. The input is read in order wi,W 2 , ■ ■ ■ and the 
output is printed in the same order. The number of steps made by the PPTM 
between reading the first symbol of Wi and printing the z-th output depends only 
on the length rz of zcj. The PPTM halts after printing the last output. 

The PPTM is uniform if and only if it eventually starts cycling (repeating the 
same configuration) with period P(n) on a sufficiently long sequence of identical 
inputs. 

The following two lemmas providing mutual simulation algorithms between 
sequential and pipelined parallel TMs can be found in [13]. 

Lemma 1. A Turing machine computing in time T{n) and space S{n) can he 
simulated by a PPTM with period P{n) = 0{S{n)), time 0(T{n)) and space 
0{T{n)). 

The idea is to store several instances (each occupying S{n) cells) on the work- 
ing tape. In each parallel step, one step of the sequential algorithm is simulated 
for each instance and the tape content is shifted by one cell towards the end of 
the tape. 

Lemma 2. Let M he a uniform PPTM computing with period P{n). Then there 
exists a sequential Turing machine Ai' with a separate input tape computing in 
space S{n) = 0{P{n)), which gets a configuration of M on its input tape and 
checks that A4 returns into the same configuration after P{n) steps. 

The simulation algorithm uses the fact that the content of cell z after P{n) 
steps depends only on the initial state of cells z — P{n), . . . ,i + P{n). Thus only 
0{P{n)) cells has to be stored on the working tape of Ai' at any given time. 

Lemmas 1 and 2 seem to support the claim of [13] that a uniform PPTM is 
a member of class Cweak- But we refute this claim by the following theorem and 
its corollary. 

Theorem 1. Every sequential Turing machine (TM) algorithm that halts on 
each input can he simulated by a uniform PPTM algorithm with period P{n) = 
0(n). 

Proof. The PPTM simulation algorithm consists two phases. In the precom- 
putation phase, results of TM computations are computed and stored on the 
working tape for all possible inputs of length n. During the computation 

phase, a simple table lookup is performed for each input. An input is compared 
(sequentially) to each table element and if the input matches, the corresponding 
output is used. A single comparison of one input takes linear time and there are 
exponentially many possible inputs, but exponentially many comparisons (on 
different inputs) are done in parallel. Hence, the period is 0{n). □ 
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Corollary 1. The uniform PPTM is not a model belonging to the class of weak 
parallel machines. 

Proof. It suffices to prove that there is a uniform PPTM computing with period 
P{n) such that it cannot be simulated by a Turing machine in space 0{P^{n)) for 
any constant k > 0. Consider a problem with TM space complexity S{n) = u{n^) 
for all /c > 0. It means that there is no TM which solves the problem in space 
less than S{n). A TM algorithm for this problem can be simulated by a PPTM 
with period P{n) = 0{n) according to Theorem 1. A simulation of the PPTM 
on the TM in space 0{P^{n)) = 0{n^) yields a contradiction to the assumption 
about S{n). □ 

Now it is clear that the power of a uniform PPTM must be restricted to 
become a weak parallel machine. We define two such restrictions: limited and 
strictly pipelined PTMs. 

Definition 2 (Limited PPTM). A uniform PPTM with period P{n) is called 
limited PPTM if and only if the computation cycle^ contains configuration C , 
such that there is a sequential Turing machine algorithm working in space S{n) = 
0{P^{n)) for some constant k > 0, which gradually prints the cells of C from 
left to right^ to its output tape, given the input of size n. 

Theorem 2. Limited PPTM is a member of the class of weak parallel machines. 

Proof. A simulation of a space S{n) sequential computation on a PPTM with 
period P{n) = 0{S{n)) is provided by Lemma 1. The algorithm is limited, 
because the PPTM tape contains configurations of a space bounded sequential 
machine. Therefore parts of the tape corresponding to individual instances can 
be generated by the sequential algorithm in space 0{S(ji)). 

The reverse simulation uses the algorithm from Lemma 2 to check that the 
PPTM returns into the same configuration after a period. The special config- 
uration C (required to exist by the definition of the limited PPTM) is tested, 
because it can be generated in space S{n) = 0{P^{n)). If a new cell is needed by 
the cycle testing algorithm, it is obtained by running the C generating algorithm 
until one new cell is printed. □ 

Definition 3 (Strictly pipelined PTM). A uniform PPTM is called strictly 
pipelined, if and only if its working tape can be partitioned into subsets of cells 
pertinent to individual instances (each cell is pertinent to at most one instance) 
and no information is ever shared by several instances (passed from one instance 
to another). 

The above definition is only informal, see [2] for more precise formal and 
quite technical definition. 

Theorem 3. Strictly pipelined PPTM is a member of the class of weak parallel 
machines. 

^ which the computation is required to enter due to uniformity 
^ assuming the working tape starts in the left and stretches to the right 
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Proof. During one cycle (induced by uniformity), a new input is read and com- 
putation of a new instance begins. A new instance does not know how many 
instances have already started before, therefore it must enter the cycle. Oth- 
erwise the cycling behavior would not be guaranteed. During one period, i.e., 
P{n) steps, the instance can allocate up to 0{P{n)) tape cells. As cycling is re- 
quired, these cells must be freed again in subsequent 0(P(n)) steps for usage by 
the next instance. At the same time, only 0(P(n)) new cells can be allocated. 
The instance cannot utilize more than 0(P(n)) cells at any time and hence 
the computation of one instance can be simulated by a nonpipelined PTM in 
space 0(P(n)). Transformation to a sequential TM algorithm working in space 
S(n) = 0(P^(n)) for some constant k > 0 can be found in [13]. 

The reverse simulation of a space S(n) Turing machine on the pipelined PTM 
with period P(n) = 0(S(n)), as described in Lemma 1, satisfies the restriction 
of the strictly pipelined PTM. □ 

3 Pipelined Decomposable BSP 

The Bulk Synchronous Parallel Computer [8,10] consists of p processors with 
local memories. Every processor is essentially a RAM. The processors can com- 
municate by sending messages via a router (some communication and synchro- 
nization device). The computation runs in supersteps, i.e., the processors work 
asynchronously, but are periodically synchronized by a barrier. A superstep con- 
sists of three phases: computation, communication, and synchronization. In the 
computation phase, the processors compute with locally held data. The com- 
munication phase consists of a realization of so-called /i-relation, i.e., processors 
send point-to-point messages to other processors so that no processor sends nor 
receives more than h messages. Data sent in one superstep are available at their 
destinations from the beginning of the next superstep. In the final phase of each 
superstep, all the processors perform a barrier synchronization. 

Performance of the router is given by two parameters: g (the ratio of the 
time needed to send or receive one message to the time of one elementary com- 
putational operation - the inverse of the communication throughput) and I (the 
communication latency and the synchronization overhead). Both g and I are 
non-decreasing functions of p. If a BSP computation consists of s supersteps 
and the z-th superstep is composed of Wi computational steps in every processor 
and of /ij-relation, then the time complexity of the computation is defined by 
T = + hig + l) = W + Hg + si, where W = J2i=i and H = 

We always assume the logarithmic cost of individual instructions. 

A Decomposable Bulk Synchronous Parallel Computer with p processors and 
communication parameters g and I - denoted dBSP{p,g,l) - is a BSP{p,g,l) 
computer with some modifications and additional instructions split and join. 

During a superstep, a processor can issue instruction split (i), where i G 
{0,...,p — 1}. If a processor calls split, then all other processors must call 
split exactly once in the same superstep. Beginning from the next superstep, 
the machine is partitioned into clusters (submachines) C\, . . . ,Cd, where cl is 
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the number of different values of i in instructions split (i). Processors which 
specified the same i in the split instruction belong to the same cluster, while 
different values of i imply different clusters. Communication is restricted to pro- 
cessors in the same cluster. Sending a message from a processor in one cluster to 
a processor in another cluster is forbidden. Cluster Ci can be further recursively 
decomposed using instruction split (j), which assigns the calling processor to 
subcluster Cij. 

Instruction join called by a processor cancels the last level of decomposition 
which involved the calling processor. All the processors in all the sibling - orig- 
inated from the same split operation - clusters must call join exactly once in 
the same superstep. Only one level of join is allowed in a single superstep. After 
a join, the machine can be decomposed again. 

The time complexity of a dBSP computation is T = + hig{pi) + 

l{pi)) = W + + KPi))j where pi is the size (number of processors) 

of the largest non-partitioned cluster existing in superstep s. An important prop- 
erty of dBSP is that partitioning into small clusters causes small values of g{pi) 
and l{pi) and thus faster communication. 

A PPTM is constructed from a non-pipelined PTM by adding input and 
output tapes capable of holding many inputs and outputs. Similarly, input and 
output arrays of registers can be added to the dBSP model and pipelined dBSP 
is obtained. A pipelined dBSP computer (pipe-dBSP) is a dBSP computer aug- 
mented with a pair of two-dimensional arrays / and O of read-only input and 
write-only output registers. The j-th value of the i-th input word Wi can be 
read from register A j- and the j-th value of the t-th output is written to register 
Oi j. Special registers i_se/ and osel are used to choose the particular input or 
output, i.e., only Ii_sei,j and Oo_sei,j can be accessed. Although logarithmic cost 
is used, time of an I/O instruction depends only on the value of j, not on i_sel 
or O-sel. The initial value of registers isel and osel is 0 and the only opera- 
tion permitted for them is incrementing their values by one. The increment is 
performed in unit time. This provides access to inputs in order wi,W 2 ,W 3 , . . . 
with reading later inputs (with large indices) as fast as reading wi. Only the 
first 0{n^) processors may access the I/O registers, where fc > 0 is a constant. 

Individual instructions of a BSP computer can take different numbers of time 
units and a BSP machine computes - and thus can read input or write output - 
only during a part of each superstep. Moreover, we want to be able to comprise 
several periods in a single superstep. Thus we slightly relax the definition of time 
and period of a pipelined computation. 

Definition 4. A computation of a pipelined machine on inputs of length n runs 
in time T(n) and with period P{n) if and only if the N-th output is printed after 
at most T{n) + {N — l)P{n) time units since the beginning of the computation. 

We also relax the notion of uniformity. We say that a pipelined dBSP machine 
is uniform either if it is uniform in the original sense (see p. 173), i.e., cycling 
with period P{n), or if it originated from such a cycling algorithm by packing 
several periods into a single superstep^. 

® This transformation only spares synchronization time, which is l{p) per superstep. 
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Pipelined dBSP is a powerful model, because an analog of Th. 1 and Cor. 1 
can be proved for it. The proofs are omitted, because they exploit the same tech- 
nique of building a table of all possible input/output pairs in a precomputation 
phase and simple table lookups during the rest of the computation. 

Theorem 4. Every sequential RAM algorithm that halts on each input can he 
simulated by a pipelined dBSP computer with period P{n) = O(n^). 

Corollary 2. A pipelined dBSP computer is not a member of the class of weak 
parallel machines. 

No parallelism is required, only a single processor suffices to obtain the fast 
pipelined algorithm. The period is O(n^), not 0{n) as in the PPTM version, 
because n input bits can be distributed in 0{n) input registers. Contents of 
these registers must be multiplied together to get an index to the lookup table. 
The multiplication takes time O(n^) in the worst case. 

Similarity of Theorems 1 and 4 indicate that some restriction of the pipelined 
dBSP model would be necessary in order to become a weak parallel machine. 
We define limited and strictly pipelined dBSP like we did for PPTMs in Sect. 2. 
Then we prove that these restricted dBSPs belong to the class of weak parallel 
machines. 

Restriction of a limited pipe-dBSP machine is based on bounded sequential 
space needed for computing a configuration from the cycle required by unifor- 
mity. Thus, a space-efficient sequential simulation is made possible. 

Definition 5 (Limited pipe-dBSP). A pipe-dBSP machine with period P{n) 
is called a limited pipe-dBSP machine if and only if it is uniform and its com- 
putation cycle contains a configuration C, such that there is a RAM algorithm 
working in space S{n) = 0{P^{n)) for some constant k > 0, which computes the 
contents of the i-th register of the j-th processor in configuration C , given the 
input of size n and numbers i, j. 

Theorem 5. Limited pipe-dBSP with l{p) = 0{p^) for some k > 0 is a member 
of class C^eak- 

Proof. Limited pipe-dBSP computation with period P{n) = 0{S'^{n)): 

Consider a sequential RAM algorithm with time complexity T(n) and space 
complexity S{n). Each of p = T{n)/S{n) dBSP processors stores q = T^(n) 
instances in its local memory. A superstep comprises computing the next 
T(n)/p steps of the sequential algorithm for all qp unfinished instances. 
Then every processor sends its instances to the next processor. The old- 
est q instances are finished and their output written to the output registers. 
Simultaneously, q new inputs are read from the input registers. 

Every processor uses 0{qS(n)) registers. The maximum number of bits in an 
address is therefore log(( 7 S'(n)) = logg -I- logS'(n) instead of log S'(n) in the 
RAM algorithm. Hence an instruction can be slowed down relatively to its 
RAM counterpart by factor log q. Communication is done in two phases. The 
computer is twice repartitioned into pairs of processors. In the first phase. 
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each processor with even pid sends its data to the successive odd processor. 
Data from odd to even processors are sent in the second phase. 

From time needed by a single computational superstep and its related com- 
munication supersteps, constituting together q periods qP{n) = T{n)/p- 
qlogq + 2l{p) + S{n)qlogq + logp + qS{n)g{2) + 2l{2), and after substitutions 
for p and q, we obtain the length of the period P{n) = 0{S{n) log T(n)). A 
well known fact about RAM time and space complexities T{n) = 
yields P{n) = 0{S‘^{n)). 

The algorithm cycles with cycle length qP(n). Every configuration consists 
of RAM configurations in different stages of processing and therefore can be 
generated by a sequential algorithm in space S(n). Hence the simulation can 
be performed by a limited pipe-dBSP. 

Simulation of a limited pipe-dBSP computer on a RAM: 

The RAM machine generates configuration C which exists in the cycle by 
definition. Then it simulates one period and checks that dBSP returns to C. 
The largest value manipulated - and thus the highest addressable register 
and the number of processors - during the period is bounded by 
due to the logarithmic cost. A single dBSP period can be simulated by 
a PRAM in time 0{P^{n)) using additional p^ processors for handling h- 
relations. The PRAM is a member of the second machine class [11] and thus 
its time (0{P\n)) in our case) is polynomially related to sequential space 
0{P^ (n)). The corresponding simulation algorithm is started in configura- 
tion C, which can be generated in space 0{P^{n)). Hence the total space 
needed is ^). □ 

Definition 6 (Strictly pipelined dBSP). A pipelined dBSP computer with 
period P{n) is strictly pipelined, if and only if its registers can he partitioned 
into subsets of registers pertinent to individual instances, no information is ever 
shared by several instances, and at most 0{P^{n)) work‘d, for some constant 
k > 0, is done on registers pertinent to a single instance during a single period. 

Theorem 6. Strictly pipelined dBSP with l{p) = 0{p^) for some k > 0 is a 
member of class Cweak • 

Proof. Strictly pipelined dBSP computation with period P{n) = 0(S'^(n)): 
The algorithm from the proof of Th. 5 can be used, because it is strictly 
pipelined. Each instance is processed in a separate subset of registers and 
no value computed by an instance is reused during computation of another 
instance. 

Simulation of a strictly pipelined dBSP computer on a RAM: 

The strictly pipelined dBSP machine starts cycling with period P{n), be- 
cause it is uniform. A new input word is read and computation of the new 
instance begins during the cycle. The new instance has no information about 
the other instances, thus it must enter the cycle to ensure cyclic behavior. 

total computational steps performed by all processors working on this instance 
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Limited work allows for allocation of space only up to 0{P^{n)) bits of mem- 
ory. This memory must be made free for the next instance during the next 
period. Hence, total memory consumed by a single instance is 0{P^{n)). 
The simulation algorithm takes the input and progressively executes all pe- 
riods (cycles) with this input until the output is produced. Simulation of 
a single period uses the same method as in the previous proof and utilizes 
0{P^ (n)) RAM registers. As no information from the registers pertinent 
to other instances can be utilized, only 0{P^{n)) registers pertinent to the 
single instance being processed have to be stored from one cycle to the next. 
Other registers can be assumed to contain 0. □ 

4 Conclusion 

Our goal was to analyze pipelined computations on the dBSP machine model. To 
do it, we have used a framework of weak parallel machines. We have presented 
a definition of class Cweak- Pipelined parallel Turing machines - defined and 
analyzed in [13] - are already known candidates for being weak parallel machines. 
In paper [13], membership of PPTM in class Cweak was claimed. The proof was 
based on ideas mentioned in Lemmas 1 and 2. In this paper, we identified a 
problem in that proof: ability to simulate a single period in a space-efficient 
way does not yield simulation of the whole computation in small space. As we 
have shown in Cor. 1, a major part of computation can be performed in its 
initial phase, before the machine starts cycling. Hence, a uniform PPTM is not 
a weak parallel machine, but a suitably restricted PPTM becomes a member of 
Cweak- We have defined two possible restrictions - limited and strictly pipelined 
PPTMs. 

The second type of machine models studied in this paper are pipelined De- 
composable BSP computers. The situation in this case is similar to PPTMs. A 
general dBSP machine is too powerful (even without any parallelism, i.e., with 
only one processor used) and therefore is not a weak parallel machine. Two mod- 
ifications of the pipelined dBSP model - limited and strictly pipelined dBSPs - 
have been introduced and their membership in Cweak proved. They are based on 
the same ideas as the corresponding limited and strictly pipelined PPTMs. This 
result indicates that dBSP is a viable realistic model of parallel computation. We 
have used dBSP and not BSP machines, because the algorithm for simulation of 
S'(n)-bounded sequential computation in period P(n) needs fast communication. 
A dBSP machines can be partitioned into clusters of size 2, thus the time of an 
/i-relation is hg{2) + l{p) instead of much larger BSP time hg{p) + l{p). Note 
that h = T^{n)S{n) is large in our case. 

An important (but not surprising) observation is that if some data are com- 
puted once and then reused during a pipelined computation, then the total time 
needed to process all instances can be made significantly smaller. It could be 
interesting to study relations of pipelined computation and another recently 
emerging paradigm of computing - interactive machines. Like a pipelined com- 
puter, an interactive machine processes a (potentially infinitely) long sequence of 
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input data, produces corresponding output data, and is able to store information 
in its internal memory for later use. 
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Abstract. The paper adds the one-counter one-way finite automaton [6] 
to the list of classical computing devices having quantum counterparts 
more powerful in some cases. Specifically, two languages are considered, 
the first is not recognizable by deterministic one-counter one-way finite 
automata, the second is not recognizable with bounded error by proba- 
bilistic one-counter one-way finite automata, but each recognizable with 
bounded error by a quantum one-counter one-way finite automaton. This 
result contrasts the case of one-way finite automata without counter, 
where it is known [5] that the quantum device is actually less powerful 
than its classical counterpart. 



1 Introduction 

As a general background for the present paper we refer to Gruska’s monograph 
[4]. Analogues to well known, classical computing devices such as finite automata 
and Turing machines have been defined and studied, often with surprising re- 
sults. Thus, for example, the quantum Turing machine introduced by Deutsch [3] 
proved to have the same computational power as the classical deterministic one. 
The 2- way quantum automata of Kondacs and Watrous [5] , on the other hand, 
recognize all regular languages and some non-regular languages such as 0”10”, 
and are thus more powerful than their 2- way probabilistic counterparts. The 
same authors showed, however, that the opposite relation holds for the quantum 
one-way finite automata, which can recognize only a proper subset of regular 
languages. It cannot recognize for example the language {0, 1}*1. 

In the paper we show that the situation changes again if the one-way automa- 
ton is supplied with a counter; the quantumization of automata with counter was 
recently considered by Kravtsev [6]. We presently offer two languages recogniz- 
able by a quantum one-counter one-way finite automaton, one of which fails to 
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be recognizable deterministically and the other probabilistically with bounded 
error, by classical one-way one-counter automata. 



2 Definitions 

Automata. See Gruska [4] for general and Condacs and Watrous [5] for tech- 
nical background material on quantum finite automata. The quantum automata 
with counter which we consider here, were considered by Kravtsev [6], which we 
also refer to for formal definitions. Here, we recall the general ideas of the model 
only. 

To begin with, a deterministic one-way automaton with counter (IDICA) 
is specified by a finite input alphabet U, a finite set Q of states with a singled 
out initial state, two disjoint sets of accepting and rejecting states Qa and Qr, a 
counter which is allowed to hold an arbitrary integer, and a transition function 6 
updating the state and the counter at each step of computation as input letters 
are read from the tape. The value of the function S depends on the current state, 
the read letter and whether the counter is zero or non-zero, but not on the exact 
value of the counter. The counter is set to 0 at the beginning of computation; it 
is allowed to change at each update by at most one. The automaton processes 
each letter of the input word precisely once until the last letter of the word is 
reached. If the automaton is then in an accepting state, the word is considered 
accepted, while if in a rejecting state, the word is rejected. Formally 5 is a 
mapping QxExS^QxD, where the elements of S' = {0, 1} indicate whether 
the counter is 0 or not, and D = {—1,0, 1} contains the number by which the 
counter’s value may change. 

A probabilistic version of this type of automaton (IPICA) is obtained in 
the usual way, essentially by considering the states of the deterministic case 
as point masses in a larger set of probability measures, to which the evolution 
of the automaton is then extended. Formally S is now defined as a mapping 
5: QxSxSxQxD — >• i?+, where S{q,a,s,q',d) describes probability of 
getting from state q and some value of the counter (zero when s=0 and nonzero 
when s=l) by reading letter a , to state q' and change the value of the counter 
by d. S should satisfy the following condition: ^ 

each q € Q, a € S, s € {0, 1}. 

Finally, for the one-way quantum one-counter automaton (IQICA), the al- 
phabet is supplemented by end markers and <-P,and the transition function 
6: QxFxSxQxD^C, is complex- valued and assumed to satisfy certain well- 
formedness conditions. Here F = E U |S-^,<-P}. The definitions of the counter 
and the actions on it remain unchanged from the classical version. Language 
recognition works then roughly as follows. For each letter ct of a word extended 
by the end markers, two actions are performed: 

1) a unitary operator Ua is applied to the current state of automaton, and 

2) the certain measurement is applied to the resulting state. 

The measurement leads with certain probabilities that are determined by result- 
ing state to one of three outcomes - the word is either rejected or accepted ( and 
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computation stops in both cases) or the computation proceeds with the next 
letter. 

More formally we describe the dynamics of a quantum automaton with 
counter as the evolution of a quantum system defined as follows: 

Quantum state. The state of IQICA is considered as normalized vector of 
the Hilbert space hiQ x Z) with basis vectors \q, k) corresponding to the possible 
configurations of the classical automaton - the combinations of state q € Q and 
counter value fc € Z. So the state of a quantum automaton is a linear combination 
or superposition of basis vectors, where the modules of complex 

amplitudes aik € C sum up to one ||aifc|P = 1- 

Evolution. Operator Ua- is defined in terms of <5 on a vector |q, k) 

Ua \q,k) = E 6{q, a, sign{k), q , d) \q' , k + d) , 

q',d 



where sign(/c) = 0 if fc = 0 and 1 otherwise. is extended by linearity to 
map any superposition of basis states. To ensure that Ua is unitary 6 should 
satisfy the certain conditions of well-formedness. In this paper we consider only 
the so called simple one-way one-counter quantum automata for which well- 
formedness conditions are as follows [6]: for each a G E,s G {0,1} there is 
a linear unitary operator Va,s on the inner product space hiQ) and a function 
B: Q,r ^ 0, 1} such as for each q,q' G Q 



6{q,a,s,q',d) 



W\Va,s\q) HB{q',a)=d 
0 else 



where (q'\Va,s\q) denotes the coefficient of \q') in Va^s |<?)- 
In other words 6 is defined according to the following: 

1) the set of finite unitary matrices, that map states from Q to states from Q, 
different for each input letter and zero or not zero counter value. 

2) the change of the value of the counter is determined only by the read letter 
and the state from Q, that automaton moves in. 

Measurement. An observation of the superposition aik\qi, k) results in an 
acceptance of the word with probability Pa = ^ik where qi G Qa, rejection 
of the word with probability Pr = where qi G Qr. Computation halts in 

these cases. Otherwise the computation continues with the renormalized state 
where qi,qj G Q\{Qa\JQr) 



The total probability of accepting a word is a sum of probabilities to ac- 
cept the word at each step. The total probability to reject a word is a sum of 
probabilities to accept the word at each step. 



Languages. Let if be a finite alphabet. For S C S define the ‘projection’ 
map ITS : S* — >■ S* which acts on words over S by forgetting all letters not 
in S. When S is given explicitly as ct„|, we write TTai,...,an- Note, in 

particular, that the length |7Tcr(a;)| counts the occurrences of a letter cr S if in a 
word X G S*. 
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The language L\. Consider the alphabet S = {0, 1, 2, b, 3, 4, 5, jj} with two special 
symbols b and Let = {0, 1, 2, b} and = {3,4, 5, #}. We define languages 
L\ , L2 t as follows 

L\ = {w € E* : 7Tj;b(w) = x\>y,x G |0,l}*,y G {2}* and |7To(a;)| = |7Ti(a;)|} 

L 2 = {w € E* : TT^\, (w) = x\>y, x G (0, 1}*, y G {2}* and 
|7To(a;)| = |7Ti(a;)| + |7r2(y)| |7T2(y)| > 0} 

4 = r*/(L{u4) 

Similarly, for the alphabet E^ 

l{ = {w € E* : 7Tsi,{w) = x<^y,x G (3,4}*,y G (5}*and |7T3(x)| = |7T4(a;)|} 
l\ = {w G E* : TT^tiiw) = x^iy,x G (3,4}*,y G {5}* and 

|7T3(a;)| = |7T4(a;)| + |7T5(y)| |7T5(y)| > 0} 

4 = E*/{l{u 4) 

The language Li is formally defined as follows: 

L, = {L\n4)u{4n4). ( 1 ) 

The language ^ 2 - Additional symbols a,/3i,/32 are added to the alphabet of Li. 
A word is in L2, if it has the form 



xi ayi X2 ay2 X3 ays . . . ay„_i Xn a (2) 

with xi,X2, ■ ■ ■ ,Xn in Li and j/i=/3i iff Xi is in {L\ C 4)^ yi=02 iff Xi is in 

(4n4). 

3 Results 

Theorem 1. The language Li cannot he recognized by deterministic one-counter 
one-way automata, but it can be recognized with bounded error by a quantum one- 
counter one-way automaton. 

Proof. To prove the first claim, assume to the contrary that a deterministic one- 
counter one-way finite automaton recognizing Li does exist and has k states. 
Consider the words Xij = 0*3M*4^“^btt5, where j < i < n. Clearly, all Xij G Li. 
When the first part 0*3^ of the word Xij has been read, the value of the counter 
is b -I- j < 2n at most, so the automaton can at this stage distinguish 2nk of 

the words Xij at most that are |n(n -I- 1) in total. Thus, if n is large enough, 

two different words, and Xi^j^, would, at this step of computation, share 

the same state and counter value. But then, clearly, the automaton would also 
accept the words 0*i3-lM*24^2“^bt|5 and 0®^3-b^ neither of which is in 
the language Li. 

We now describe a IQICA which, as we subsequently show, recognizes Li 
with bounded error. In addition to an initial state, the automaton will have 
sixteen non-terminating states qijk, q[j, q'fj, i,j,k = 1,2, four accepting states 
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oi, . . . , 04, and 8 rejecting states ri, . . . , rg. As customary, we interpret invertible 
transformations of the set of basis states of the automaton as unitary operators 
in its quantum configuration space. 

When the initial marker comes in, the states get amplitudes (— 
while all the remaining states get amplitude 0 . When any of the symbols 0 , 1 , 3 , 
or 4 arrives, the state remains unchanged; the counter is changed only in the 
following cases: the symbols 0 and 3 increase the counter for the states qijk 
and q2jk, respectively, while the symbols 1 and 4 decrease the counter for the 
states qijk and q2jki respectively. 

The special symbol b is ignored, if read in any of the states q2jk or g" ; if read 
in the state qijk and the counter is 0, the state g'j, follows, whereas the state 
q'jk>^ follows if the counter is not 0; if read in the state gb, the rejecting states 
ri, . . . , T4 follow. 

The special symbol ft is ignored if read in any of the states qijk or gb; if 
read in the state q2jk and the counter is 0, the state g"^ follows, while the state 
q'ji., follows if the counter is not 0; if read in the state g"-, the rejecting states 
r5,...,rg follow. 

The symbol 2 is ignored, if read in any of the states q2jk, Qij, or g'-^; if read 
in state qijk the rejecting states ri, . . . ,T4 follow; if read in the state g'2, the 
state remains unchanged while the counter decreases. 

The symbol 5 is ignored if read in any of the states qijk, q'ij, or q"^; if read 
in the state q2jk the rejecting states ri, . . . , r4 follow; if read in the state g'2, the 
state remains unchanged while the counter decreases. 

When the end marker <-P arrives, if the value of the counter is 0 , a unitary 
transformation is applied as in the following transition table (here and in fol- 
lowing tables first column denotes the initial states, first row denotes resulting 
states, elements of the table are amplitudes): 

ai 02 Ti r2 03 04 rs r4 
q'li \ i I i 0 0 0 0 

Qi 2 -5 I I -5 0 0 0 0 

g^iO 000|i i i 

g^2 0 0 0 0 I -i I -5 

g'1'1 5 -H -5 0 0 0 0 

912 -5 -I I 5 0 0 0 0 

g^i 0 0 0 0 I i -i 

g^2 0 0 0 0 I -i -I i 

To complete the description of the automaton we should extend the transi- 
tions given above as follows: 

- Each non terminating state, in the case that there is no transformation 
given above for some letter and zero or non zero counter value separately, should 
be mapped to some rejecting state. This can be easily accomplished without 
violating unitarity by adding some more rejecting states. 

^ If e is an element of a two-element set (here {1,2}), we write e* to denote the other 
of the two elements. 
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- The other transitions that are not described above should be specified 
arbitrary, to ensure the unitarity of the matrix describing transformation for 
each letter and counter value. 

It remains to verify that the automaton described above indeed recognizes 
the language L\ with bounded error. First we need to compute the non-zero 
amplitudes for the automaton’s states when a word x G S* has been processed, 
right before the end marker <-P. We do this for each of the eight cases as a: G 
L\ n L*, i,j = 1 , 2 , 3 . We look first at the cases i,j = 1 , 2 . The value of the 
counter is 0 in all these cases. Note that in the case when x is an empty word, 
it has the same distribution of amplitudes as in x G L \(1 l{, so we will consider 
these two cases together: 





911 9i 2 921 922 9n 9i2 ^21 922 


a; G n L\ 

xGL\c\ l\ 
x G L\g\i\ 
X G L\c\ L\ 


1 0 -i 0 -i 0 i 0 

1 0 -i 0 0 -i 0 i 

0 i 0 -i-i 0 i 0 

0 i 0 -i 0 -i 0 i 



The following table shows non-zero amplitudes after the end marker <-P has 
been processed: 





ai 02 ri T2 03 04 rg rg 


xGL\f\ L\ 
xGL\Ci l\ 
xGL\Ci l\ 
X gl\ a l\ 


0 i 0 i 0 0 -i -i 

-2 i 0 0 0 -i -i 0 

-i i 0 0 0 i -i 0 

0 i 0 -i 0 0 -i i 



Hence, after the measurement, the accepting probability for x G Li = {L\ n 
l\) U (^2 is equal to |, while for x G L\c\ l\ or x G L\c\ l\ it is equal to 

the corresponding rejecting probabilities are complementary. 

It remains to check the cases when i or j in L\ fl L® is equal to three. The 
amplitudes for the states corresponding to non-terminating states and zero value 
of the counter just before the end marker <-P arrives are then as follows: 





9n 


9i2 


921 


922 


9'i'i 


9i2 


921 


922 


a; G n 
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2 


0 


1 

2 


0 


0 


0 


0 


0 


a: G T 2 Tg 


0 


1 

2 


0 


1 

2 


0 


0 


0 


0 


a; G Tij n l { 


0 


0 


0 


0 


1 

2 


0 


1 

2 


0 


a: G Tg n 


0 


0 


0 


0 


0 


1 

2 


0 


1 

2 



There are also non zero amplitudes for states with values of the counter not 
equal to 0 with norm they will cause a rejecting states to be observed. 

In the remaining case a; G T3 fl Tg whether the word is rejected before end 
marker or all states with zero value of the counter have zero amplitudes. 
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Hence, after the end marker has been processed, we have the following ter- 
minating amplitudes: 





Oi 02 ri r2 03 04 T3 T4 


xG L\f\ L\ 
xGL\c\L\ 
a; G n 

a; G n 


11111111 
4 4444444 

111 11111 
44 4 44444 

11 111111 

44 444444 

11 111111 

4 4 4 44444 



and in the case x G Lg fi Lg all non-terminating amplitudes are still 0. The 
probability of acceptance of x in any of these four cases is equal to while in 
the case a; G Lg n Lg it is zero; the rejecting probabilities are complementary. 

To sum it up, the automaton accepts all words in the language Li with 
probability |, and rejects all words not in Li with probability at least 

Theorem 2. The language L 2 cannot he recognized with bounded error by prob- 
abilistic one-counter one-way finite automata, but it can he recognized with prob- 
ability ^ by a quantum one-counter one-way finite automaton. 

Proof. For the first statement, we first note that the language Li cannot be rec- 
ognized with probability 1 by a probabilistic one-way one-counter finite automa- 
ton. Indeed, assuming the contrary and simulating the probabilistic automaton 
by a deterministic one which take the first of available choices of probabilistic 
automaton with positive probability at any time (our automaton reads one input 
symbol at a time and moves to the next symbol, so we can build such simula- 
tion), would bring us into contradiction with the first part of Theorem 1. The 
impossibility of recognizing L 2 by a probabilistic automaton with a bounded 
error now follows, since the subwords Xi G Li of a word x in L 2 can be taken in 
arbitrarily large numbers, and every Xi is processed with a positive error. 

For the second part of the theorem, we extend the construction of the quan- 
tum automaton described in the proof of Theorem 1. We should add transfor- 
mations for the symbols a,/3i,/32,- We need four other non-terminating states qi 
for it. 

The transformation for the a is described as follows: 

qi q 2 ri T 2 T3 qs q^ r^ 

g'li ^ 0 i i 0 0 0 0 

q'i2 0 5 -5 0 0 0 0 

921 0 0 0 0 i 0 ^ i 

922 0 0 0 0 i 0 -i 

9ii 0 i 0 0 0 0 

912-;^ 0 i i 0 0 0 0 

9^1 0 0 0 0 i ^ 0 -i 

9^2 0 0 0 0 i 0 i 
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The transformation for states ql,q2,q3,q4 and zero value of the counter for 
both letters f3i and (32, can be written as follows (the first row shows resulting 
states for (3i and the second for /?2): 



{(3i) qiii qi2i 9211 9221 M T2 

iP2) 

91 

92 

93 

94 



ri 

1 

0 

0 

0 



4-2 

0 

0 

0 

1 

V2 



rs 

0 

0 

0 

1 

V2 



’’4 9lll 9121 9211 9221 

^ 0 0 0 0 

75 

0 



0 0 



1 

75 

0 



1_ 

75 

0 



0 0 



1 

75 

0 



Finally we describe the transformation for the end marker <-P for states 9I, 
92 , 93 , 94 and zero value of the counter as follows: 



Oi 02 

91 1 0 

92 0 1 

93 0 0 

94 0 0 



«3 

0 

0 



75 



ri 



0 

0 




To complete the description of the automaton we should extend the described 
transformations for each letter and zero and non zero value of the counter as in 
the automaton of Theorem 1. 

It remains to verify that the automaton described above recognizes the lan- 
guage L 2 with probability |. 

While processing x\ the automaton acts as described for Li, so when first a 
comes the distribution of amplitudes is exactly the same as described in proof 
of L\ before <-P. After applying transformation for a the automaton gets the 
amplitudes for states with zero value of the counter that are shown in the second 
column of the following table: 





9l 92 ri T 2 T 3 93 94 T4 


Pr 


9i 92 93 94 


xi G l\ n l\ 
x\ & L\c\ l\ 
G T2 bl l\ 
X4 G Ij2 fl ^2 


M W 0 i 0 ^ -i 

275 275 2 275 275 2 

75 0 0 0 0 0 -75 0 

0 -75 0 0 0 0 0 

M W 0 -i 0 ^ i 

275 275 2 275 275 2 


1 

2 

0 

0 

1 

2 


1111 
2 2 2 2 

75 0 0 

0 - 75 ;^ 0 
1111 
2 2 2 2 



The third column shows the total probability of rejection after processing 
x\a. So we see that only in the cases x\ G L\CiL\ or x\ G the probability 

of rejection is 0. When xi & L\C\ l\ or xi & L\f^ l\ the probability of rejection 




Similar computation for cases when i or j in L\ n is equal to 3 give the 
probability of rejection at least |, so these cases are not considered further. 

Note that during the measurement that follows a if automaton do not ter- 
minates according to definition of the measurement non zero amplitudes remain 
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only for non terminating states and vector is normalized. So the amplitudes after 
the measurement for this case are shown in the last column. 

If (3i or (32 come after a, the amplitudes become as follows: 





(/3i) 




<7111 


<7121 


<7211 


<7221 


ri 


r2 


rs 


ta 








ri 


T2 


ra 


T4 


gill 


gi21 


5211 


9221 


Xi 


G L\ n 




1 

2V2 


1 

2V2 


1 

2V2 


1 

2-v/2 


1 

2V2 


1 

2>/2 


1 

2V2 


1 

2>/2 


Xi 


G l\ n 


4 


1 

2 


1 

2 


1 

2 


1 

2 


0 


0 


0 


0 


Xi 


G T2 n 


4 


0 


0 


0 


0 


1 

2 


1 

2 


1 

2 


1 

2 


Xi 


G 1/2 n 


4 


1 

2V2 


1 

2^/2 


1 

2^/2 


1 

2V2 


1 

2%/2 


1 

2\/2 


1 

2y/2 


1 

2%/2 



So if x\ is in {L\ fl L 2 ) j3i is read or xi is in (L 2 O l{) and P2 is read, we 
get the same amplitude distribution as after initial marker S->. 

If xi is in {L\ n L2) 1^2 is read or xi is in (L 2 O l{) and /3i is read, the 

word is rejected with probability 1. 

If xi is in {L\ n l{) or xi is in (L 2 O A*) g of fho remaining amplitudes 
is in rejecting states. So thus after a the probability of rejection for such a word 
is already the total probability of rejection becomes | 

We should also check cases when f 3 i or P2 occur in other position than after 
a. Due to the construction of automaton the word is rejected immediately. 

So we have seen that after processing x\ayi the automaton is in the same 
quantum state as after reading initial marker in the cases X\ is in {L\ n l\) 
and yi=( 3 i or x\ is in fl l\) and y\= P2, otherwise the word is rejected with 
probability at least |. So the computation on xiayi will be the same as for xiayx 
if the previous part of the word corresponds to the conditions of the language, 
otherwise the word will be rejected with probability at least |. 

Finally, we need to show the processing of the end marker. Note that it 
should come after a, otherwise the word is rejected. Assume that end marker 
comes after x\. 

The resulting amplitudes are following: 





ai 02 03 ri 


xi G l\ n l\ 
x\ & L\c\ l\ 
Xi G T 2 bl 4 

xi G n l\ 


W 0 -i 

2y/2 2y/2 2 

J_ 0 -i i 

V2 ^ 2 2 

0 -J- 1 i 

^ 2 2 

L 0 -i 

2 V 2 2 V 2 2 



So the word is accepted with probability | if the x\ G L\CiL\ or x\ G 
cases. The word is rejected with probability | already after processing a in the 
cases xi G n L\ or a;i G T 2 total probability of rejection in 

these cases is |. 
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Abstract. A distributed algorithm for the single source shortest path 
problem for directed graphs with arbitrary edge lengths is proposed. 
The new algorithm is based on relaxations and uses reverse search for 
inspecting edges and thus avoids using any additional data structures. 
At the same time the algorithm uses a novel way to recognize a reachable 
negative-length cycle in the graph which facilitates the scalability of the 
algorithm. 



1 Introduction 

The single source shortest paths problem is a key component of many appli- 
cations and lots of effective sequential algorithms are proposed for its solution 
(for an excellent survey see [3]). However, in many applications graphs are too 
massive to fit completely inside the computer’s internal memory. The resulting 
input/output communication between fast internal memory and slower external 
memory (such as disks) can be a major performance bottleneck. 

In particular, in LTL model checking application (see Section 6) the graph is 
typically extremely large. In order to optimize the space complexity of the com- 
putation the graph is generated on-the-fly. Successors of a vertex are determined 
dynamically and consequently there is no need to store any information about 
edges permanently. Therefore neither the techniques used in external memory 
algorithms (we do not know any properties of the examined graph in advance) 
nor the parallel algorithms based on adjacency matrix graph representation are 
applicable. 

The approach we have been looking upon is to increase the computational 
power (especially the amount of randomly accessed memory) by building a pow- 
erful parallel computer as a network of cheap workstations with disjoint memory 
which communicate via message passing. 

With respect to the intended application even in the distributed environment 
the space requirements are the main limiting factors. Therefore we have been 
looking for a distributed algorithm compatible with other space-saving tech- 
niques (e.g. on-the-fly technique or partial order technique). Our distributed 
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grant No. 201/00/1023. 
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algorithm is therefore based on the relaxation of graph’s edges [5]. Distributed 
relaxation-based algorithms are known only for special settings of single source 
shortest paths problem. For general digraphs with non-negative edge lengths 
parallel algorithms are presented in [6,7,9]. For special cases of graphs, like pla- 
nar digraphs [10], graphs with separator decomposition [4] or graphs with small 
tree-width [2], more efficient algorithms are known. Yet none of these algorithms 
is applicable to general digraphs with potential negative length cycle. 

The most notable features of our proposed distributed algorithm are reverse 
search and walk to root approaches. The reverse search method is known to be 
an exceedingly space efficient technique [1,8]. Data structures of the proposed 
algorithm can be naturally used by the reverse search and it is possible to reduce 
the memory requirements which would be otherwise induced by structures used 
for traversing graph (such as a queue or a stack). This could save up to one third 
of memory which is practically significant. 

Walk to root is a strategy how to detect the presence of a negative length 
cycle in the input graph. The cycle is looked for in the graph of parent pointers 
maintained by the method. The parent graph cycles, however, can appear and 
disappear. The aim is to detect a cycle as soon as possible and at the same 
time not to increase the time complexity of underlying relaxation algorithm 
significantly. To that end we introduce a solution which allows to amortize the 
time complexity of cycle detection over the complexity of relaxation. 

2 Problem Definition and General Method 

Let (G, s, 1) be a given triple, where G = {V, E) is a directed graph, I : E ^ Ris 
a length function mapping edges to real-valued lengths and s G Y is the source 
vertex. We denote n =\V \ and m =\ E \. The length l{p) of the path p is the 
sum of the lengths of its constituent edges. We define the shortest path length 
from s to u by 



e. N f min|Z(») I p is a path from s to rij if there is such a path 
= otherwise 

A shortest path from vertex s to vertex v is then defined as any path p with 
length l(p) = 6(s,v). If the graph G contains no negative length cycles (negative 
cycles) reachable from source vertex s, then for all u G Y the shortest path length 
remains well-defined and the graph is called feasible. The single source shortest 
paths (SSSP) problem is to determine whether the given graph is feasible and if 
so to compute S(s,v) for all v G V. For purposes of our algorithm we suppose 
that some linear ordering on vertices is given. 

The general method for solving the SSSP problem is the relaxation method 
[3,5]. For every vertex v the method maintains its distance label d(v) and parent 
vertex p(v). The subgraph Gp of G induced by edges (p(v), v) for all v such that 
p(v) yf nil is called the parent graph. The method starts by setting d(s) = 0 and 
p(s) = nil. At every step the method selects an edge (v,u) and relaxes it which 
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means that if d{u) > d{v) + l{v,u) then it sets d{u) to d{v) + l{v,u) and sets 
p{u) to V. 

If no d{v) can be improved by any relaxation then d(v) = 5{s,v) for all 
V G V and Gp determines the shortest paths. Different strategies for selecting an 
edge to be relaxed lead to different algorithms. For graphs where negative cycles 
could exist the relaxation method must be modified to recognize the unfeasibility 
of the graph. As in the case of relaxation various strategies are used to detect 
negative cycles [3]. However, not all of them are suitable for our purposes - 
they are either uncompetitive (as for example time-out strategy) or they are not 
suitable for distribution (such as the admissible graph search which uses hardly 
parallelizable DFS or level-based strategy which employs global data structures) . 
For our version of distributed SSSP we have used the walk to root strategy. 

The sequential walk to root strategy can be described as follows. Suppose the 
relaxation operation applies to an edge (v, u) (i.e. d{u) > d{v) + l{v, u)) and the 
parent graph Gp is acyclic. This operation creates a cycle in Gp if and only if u is 
an ancestor of v in the current parent graph. This can be detected by following 
the parent pointers from v to s. If the vertex u lies on this path then there is a 
negative cycle; otherwise the relaxation operation does not create a cycle. 

The walk to root method gives immediate cycle detection and can be easily 
combined with the relaxation method. However, since the path to the root can 
be long, it increases the cost of applying the relaxation operation to an edge 
to 0{n). We can use amortization to pay the cost of checking Gp for cycles. 
Since the cost of such a search is 0{n), the search is performed only after the 
underlying shortest paths algorithm performs I7(n) work. The running time is 
thus increased only by a constant factor. However, to preserve the correctness 
the behavior of walk to root has to be significantly modified. The amortization 
is used in the distributed algorithm and is described in detail in Section 5. 

3 Reverse Search 

Reverse search is originally a technique for generating large sets of discrete ob- 
jects [1,8]. Reverse search can be viewed as a depth-first graph traversal that 
requires neither stack nor node marks to be stored explicitly - all necessary infor- 
mation can be recomputed. Such recomputations are naturally time-consuming, 
but when traversing extremely large graphs, the actual problem is not the time 
but the memory requirements. 

In its basic form the reverse search can be viewed as the traversal of a span- 
ning tree, called the reverse search tree. We are given a local search function f 
and an optimum vertex v* . For every vertex v, repeated application of / has to 
generate a path from v to v*. The set of these paths defines the reverse search 
tree with the root v*. A reverse search is initiated at v* and only edges of the 
reverse search tree are traversed. 

In the context of the SSSP problem we want to traverse the graph G. The 
parent graph Gp corresponds to the reverse search tree. The optimum vertex 
V* corresponds to the source vertex s and the local search function / to the 
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parent function p. The correspondence is not exact since p{v) can change during 
the computation whereas original search function is fixed. Consequently some 
vertices can be visited more than once. This is in fact the desired behavior for 
our application. Moreover, if there is a negative cycle in the graph G then a 
cycle in Gp will occur and Gp will not be a spanning tree. In such a situation 
we are not interested in the shortest distances and the way in which the graph 
is traversed is not important anymore. We just need to detect such a situation 
and this is delegated to the cycle detection strategy. 



proc Reverse_search (s) 
p(s) := _L; 

V s; 

while V ^ _L do 

Do_something (t;); 
u := Get .successor {v, NULL)', 
while u does not exist do 
last v; V p{v); 
u Get.successor {v, last)\ 
od 

V u; 

od 

end 



proc CalLrecursively (t;) 

Do. something (t)); 
for each edge {v, w) G E do 
^ p{w) — V then 
CalLrecursively (lo) 
fi 
od 
end 



Fig. 1. Demonstration of the reverse search 



Fig. 1 demonstrates the use of the reverse search within our algorithm. 
Both procedures CalLrecursively (v) and Reversesearch(v) traverse the subtree 
of V in the same manner and perform some operation on its children. But 
CalLrecursively uses a stack whereas Reverse search uses the parent edges for 
the traversal. The function GeLsuccessor(v, w) returns the first successor u of u 
which is greater than w with respect to the ordering on the vertices and p(u) = v. 
If no such successor exists an appropriate announcement is returned. 



4 Sequential SSSP Algorithm with Reverse Search 

We present the sequential algorithm (Fig. 2) and prove its correctness and com- 
plexity first. This algorithm forms the base of the distributed algorithm presented 
in the subsequent section. 

The Trace procedure visits vertices in the graph (we say that a vertex is 
visited if it is the value of the variable v). The procedure terminates either when 
a negative cycle is detected or when the traversal of the graph is completed. 

The RGS function combines the relaxation of an edge as introduced in Sec- 
tion 2 and the GeLsuccessor function from Section 3. It finds the next vertex u 
whose label can be improved. The change of p{u) can create a cycle in Gp and 
therefore the WTR procedure is started to detect this possibility. If the change 
is safe the values d{u) and p{u) are updated and u is returned. 

In what follows the correctness of the algorithm is stated. Due to the space 
limits the proofs are only sketched. 
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1 proc Trace (s) 

2 p{s) := _L; v := s\ 

3 while V ^ 1. do 

4 u RGS {v,NULL); 

5 while u does not exist do 

6 last := v; v p{v)\ 

1 u := RGS (t), last); od 

8 V u; od 

9 end 

1 proc RGS {v, Zast){Relax and Get Successor} 

2 u successor of v greater than last; 

3 while d{u) < d{v) -\- l{u^ v) do 

4 u next successor of v; od 

5 if -li exists then 

6 WTR{v,u); 

7 d{u) := d{v) + l{u, -u); p{u) := v; 

8 return u; 

9 else return u does not exist; fi 
10 end 

1 proc WTR (at, looking _f or) {Y\lalk To Root} 

2 while at ^ s and at ^ looking-for do at p{at); od 

3 if at — looking-for then negative cycle detected fi 

4 end 



Fig. 2. Pseudo-code of the sequential algorithm 



Lemma 1. Let G contains no negative cycle reachable from the source vertex s. 
Then Gp forms a rooted tree with root s and d{v) > S{s,v) for all v G V at any 
time during the computation. Moreover, once d{v) = S(s,v) it never changes. 

Proof: The proof is principally the same as for other relaxation methods [5] . 

Lemma 2. After every change of the value d{v) the algorithm visits the vertex v. 

Proof: Follows directly from the algorithm. 

Lemma 3. Let G contains no negative cycle reachable from the source vertex s. 
Every time a vertex w is visited the sequence S of the assignments on line 6 of 
the procedure Trace will eventually be executed for this vertex. Until this happens 
p{w) is not changed. 

Proof: The value p{w) cannot be changed because G has no negative cycle and 
due to Lemma 1 the parent graph Gp does not have any cycle. Let h{w) denotes 
the depth of w in Gp. We prove the lemma by backward induction (from n to 
0) with respect to h{w). For the basis we have h{w) = n, w has no child and 
therefore RGS(w,NULL) returns u does not exist and the sequence S is executed 
immediately. For the inductive step we assume that the lemma holds for each v 
such that h{v) > k and let h{w) = k — 1, {ai, 02 , ... , Or} = {u \ {w, u) G B}. 
Since h(ai) = /c for all i G {1, . . . ,r}, we can use the induction hypothesis for 
each Oi and show that the value of the variable u in RGS is equal to Oj exactly 
once. Therefore RGS returns u does not exist for w after a finite number of steps 
and the sequence S is executed. ■ 
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Theorem 1 (Correctness of the sequential algorithm). If G has no neg- 
ative cycle reachable from the source s then the sequential algorithm terminates 
with d{v) = 6{s,v) for all v € V and Gp forms a shortest-paths tree rooted at s. 
If G has a negative cycle, its existence is reported. 

Proof: Let us at first suppose that there is no negative cycle. Lemma 3 applied 
to the source vertex s gives the termination of the algorithm. Let v G V and 
< Vq,Vi, . . . ,Vk >,s = Vq,v = Vk is a shortest path from s to v. We show that 
d{vi) = S(s, Vi) for all i G {0, . . . fc} by induction on i and therefore d(v) = S(s, v). 
For the basis d{vo) = d{s) = 6{s,s) = 0 by Lemma 1. From the induction 
hypothesis we have d{vi) = S(s,Vi). The value d(vi) was set to S{s,Vi) at some 
moment during the computation. From Lemma 2 vertex Vi is visited afterwards 
and the edge (vi,Vi+i) is relaxed. Due to Lemma 1, d(uj+i) > = 

5{s, Vi)-\-l{vi, Vi+i) = d{vi)-\-l{vi, Vi+i) is true before the relaxation and therefore 
d{vi+i) = d{vi) + l{vi,v^+i) = S{s,Vt) + l{vi,v^+i) = S{s,v^+i) holds after the 
relaxation. By Lemma 1 this equality is maintained afterwards. 

For all vertices v, u with v = p{u) we have d(u) = d{v) + l{v, u). This follows 
directly from line 7 of the RGB procedure. After the termination d{v) = <5(s,u) 
and therefore Gp forms a shortest paths tree. 

On the other side, if there is a negative cycle in G, then the relaxation process 
alone would run forever and would create a cycle in Gp. The cycle is detected 
because before any change of p{v) WTR tests whether this change does not 
create a cycle in Gp. ■ 

Let us suppose that edges have integer lengths and let C = max{| l{u,v) \ : 
(u,v) e E}. 

Theorem 2. The worst time complexity of the sequential algorithm is G(Gn^). 

Proof: Each shortest path consists of at most n — 1 edges and —C{n — 1) < 
S{s,v) < C{n — 1) holds for all v G V. Each vertex v is visited only after d{v) 
is lowered. Therefore each vertex is visited at most 0{Cn) times. Each visit 
consists of updating at most n successors and an update can take 0{n) time 
(due to the walk to root). Together we have 0{Gn^) bound for total visiting 
time of each vertex and 0{Gn^) bound for the algorithm. ■ 

We stress that the use of the walk to root in this algorithm is not unavoidable 
and the algorithm can be easily modified to detect a cycle without the walk to 
root and run in 0{Gn^) time. The walk to root has been used to make the 
presentation of the distributed algorithm (where the walk to root is essential) 
clearer. 



5 Distributed Algorithm 

For the distributed algorithm we suppose that the set of vertices is divided into 
disjoint subsets. The distribution is determined by the function owner which 
assigns every vertex v to a processor i. Processor i is responsible for the subgraf 
determined by the owned subset of vertices. Good partition of vertices among 
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processors is important because it has direct impact on communication com- 
plexity and thus on run-time of the program. We do not discuss it here because 
it is itself quite a difficult problem and depends on the concrete application. 

The main idea of the distributed algorithm (Fig. 3) can be summarized as 
follows. The computation is initialized by the processor which owns the source 
vertex by calling Tracers, T) and is expanded to other processors as soon as the 
traversal visits the “border” vertices. Each processor visits vertices basically in 
the same manner as the sequential algorithm does. 

While relaxation can be performed in parallel, the realization of walk to root 
requires more careful treatment. Even if adding the edge initiating the walk to 
root does not create a cycle in the parent graph, the parent graph can contain a 
cycle on the way to root created in the meantime by some other processor. The 
walk to root we used in the sequential algorithm would stay in this cycle forever. 
Amortization of walk brings similar problems. We propose a modification of the 
walk to root which solves both problems. 

Each processor maintains a counter of started WTR procedures. The WTR 
procedure marks each node through which it proceeds by the name of the vertex 
where the walk has been started (origin) and by the current value of the processor 
counter (stamp). When the walk reaches a vertex that is already marked with 
the same origin and stamp a negative cycle is detected and the computation 
is terminated. In distributed environment it is possible to start more than one 
walk concurrently and it may happen that the walk reaches a vertex that is 
already marked by some other mark. In that case we use the ordering on vertices 
to decide whether to finish the walk or to overwrite the previous mark and 
continue. In the case that the walk has been finished (i.e. it has reached the 
root or a vertex marked by higher origin, line 9 of WTR) we need to remove its 
marks. This is done by the REM (REmove Marks) procedure which follows the 
path in the parent graph starting from the origin in the same manner as WTR 
does. The values p(v) of marked vertices are not changed (line 6 of RGS) and 
therefore the REM procedure can find and remove the marks. However, due to 
possible overwriting of walks, it is possible that the REM procedure does not 
remove all marks. Note that these marks will be removed by some other REM 
procedure eventually. The correctness of cycle detection is guaranteed as for the 
cycle detection the equality of both the origin and stamp is required. 

The modifications of walk to root enforces the Trace procedure to stop when 
it reaches a marked vertex and to wait till the vertex becomes unmarked. More- 
over, walk to root is not called during each relaxation step ( WTR_amortization 
condition becomes true every n-th time it is called). 

Whenever a processor has to process a vertex (during traversing or walk to 
root) it checks whether the vertex belongs to its own subgraph. If the vertex is 
local, the processor continues locally otherwise a message is sent to the owner 
of the vertex. The algorithm periodically checks incoming messages (line 4 of 
Trace). When a request to update parameters of a vertex u arrives, the processor 
compares the current value d(u) with the received one. If the received value 
is lower than the current one then the request is placed into the local queue. 
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1 proc Main 

2 while not finished do 

3 req pop(queue); 

4 if req.length — d{req.vertex) then Trace (req. vertex, req. father); fi 

5 od 

6 end 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 



proc Trace (v, father) 
p(v) := father; 
while V 7 ^ father do 
Handle_messages; 
u RGS(v,NULL); 
while u does not exist do 
last := v; v p(v); 
u := RGS(v, last); od 

V u; 

od 

end 
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6 
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13 

u 

15 

16 
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proc RGS (v, last) {Relax and Get Successor} 
u successor of v greater than last; 
while u exists do 

if n is local then 

if d(u) > d(v) l(u, v) then 
if mark(u) then wait; fi 
p(u) v; 

d(u) d(v) + l(u, v); 

if WTR.amortization then WTR([u, stamp], u); inc(stamp); 
return u; 
fi 

else send_message( oicner(ii), “update u, v, d(u) + l(u, t’)”); 

fi 

u next successor of v; 
od 

return u does not exist; 

end 



fi 
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proc WTR ([origin, stamp], at) {Walk To Root} 
done false; 
while —I done do 

if at is local 

then 

if mark(at) — [origin, stamp] 

sendjmessage(Manager, “negative cycle found”); 
terminate 

[] (at — source) V (mark(at) > [origin , stamp]) 
if origin is local 

then REM([origin, stamp], origin) 
else send^message(owner (origin) , 

“start REM ([origin, stamp], origin))” fi 

done true; 

[] (mark(at) — nil) V (mark(at) < [origin, stamp]) 
mark(at) [origin, stamp]; 
at p(at) 

fi 

else send_message(orimer(at), “start WTR([origin, stamp], at)”) . 
done true 



end 



Fig. 3. Pseudo-code of the distributed algorithm 
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Anytime the traversal ends the next request from the queue is popped and a 
new traversal is started. 

Another type of message is a request to continue in the walk to root (resp. in 
removing marks), which is immediately satisfied by executing the WTR (resp. 
REM) procedure. 

The distributed algorithm terminates when all local queues of all processors 
are empty and there are no pending messages or when a negative cycle is de- 
tected. A manager process is used to detect the termination and to finish the 
algorithm by sending a termination signal to all processors. 

Theorem 3 (Correctness and complexity of the distributed algorithm). 

If G has no negative cycle reachable from the source s then the distributed algo- 
rithm terminates with d{v) = 6{s, v) for allv€V and Gp forms a shortest-paths 
tree rooted at s. If G has a negative cycle, its existence is reported. 

The worst time complexity of the algorithm is 0{Gn^). 

Proof: The proof of the correctness of the distributed algorithm is technically 
more involved and due to the space limits is presented in the full version of the 
paper only. The basic ideas are the same as for the sequential case, especially in 
the case when G has no negative cycle. Proof of the correctness of the distributed 
walk to root strategy is based on the ordering on walks and on the fact that if 
G contains a reachable negative cycle then after a finite number of relaxation 
steps Gp always has a cycle. 

Complexity is 0{Gn^) due to the amortization of the walk to root. ■ 

6 Experiments 

We have implemented the distributed algorithm. The experiments have been 
performed on a cluster of seven workstations interconnected with a fast 100Mbps 
Ethernet using Message Passing Interface (MPI) library. 

We have performed a series of practical experiments on particular types of 
graphs that represent the LTL model checking problem. The LTL model checking 
problem is defined as follows. Given a finite system and a LTL formula decide 
whether the given system satisfies the formula. This problem can be reduced to 
the problem of finding an accepting cycles in a directed graph [11] and has a 
linear sequential complexity. In practice however, the resulting graph is usually 
very large and the linear algorithm is based on depth-first search, which makes 
it hard to distribute. We have reduced the model checking problem to the SSSP 
problem with edge lengths 0,-1. Instead of looking for accepting cycles we detect 
negative cycles. 

The experimental results clearly confirm that for LTL model checking our 
algorithm is able to verify systems that were beyond the scope of the sequential 
model checking algorithm. 

Part of our experimental results is summarized in the table below. The table 
shows how the number of computers influences the computation time. Time is 
given in minutes, ’M’ means that the computation failed due to low memory. 
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Number of Computers 


No. of Vertices 


1 


2 


3 


4 


5 


6 


7 


94578 


0:38 


0:35 


0:26 


0:21 


0:18 


0:17 


0:15 


608185 


5:13 


4:19 


3:04 


2:26 


2:03 


1:49 


1:35 


777488 


M 


6:50 


4:09 


3:12 


2:45 


2:37 


2:05 


736400 


M 


M 


M 


6:19 


4:52 


4:39 


4:25 



7 Conclusions 

We have proposed a distributed algorithm for the single source shortest paths 
problem for arbitrary directed graphs which can contain negative length cy- 
cles. The algorithm employs reverse search and uses one data structure for two 
purposes — computing the shortest paths and traversing the graph. A novel dis- 
tributed variant of the walk to root negative cycle detection strategy is engaged. 
The algorithm is thus space-efficient and scalable. 

Because of the wide variety of relaxation and cycle detection strategies there 
is plenty of space for future research. Although not all strategies are suitable for 
distributed solution, there are surely other possibilities besides the one proposed 
in this paper. 
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Abstract. We present a language See for a specification of the direct ex- 
change and/or the global sharing of information in multi-agent systems. 
See is based on concurrent constraint programming paradigm which we 
modify in such a way that agents can (i) maintain its local private store, 
(ii) share (read/write) the information in the global store and (iii) com- 
municate with other agents (via multi-party or hand-shake). To justify 
our proposal we compare Sec to a recently proposed language for the 
exchange of information in multi-agent systems. Also we provide an op- 
erational semantics of Sec. The full semantic treatment is sketched only 
and done elsewhere. 



1 Introduction 

Multi-agent system is a system composed of several autonomous agents that 
operate in a distributed environment which they can perceive, reason about as 
well as can affect by performing actions. In the current research of multi-agent 
systems, a major topic is the development of a standardised agent communi- 
cation language for the exchange of information. Several languages have been 
proposed, e.g. [7,9,12,4]. Recently de Boer et al.([4j) have also (for the first 
time) introduced a formal semantic theory for the exchange of information in 
the multi-agent systems. Their approach uses principles of concurrent constraint 
programming (CCP) to model the local behaviour of agents while the commu- 
nication is modelled by a standard process algebraic hand-shake approach. 

Our proposal is based on CCP paradigm only, however semantics of mecha- 
nism for updating and testing the (local/global) store(s) are changed. 

CCP has inherited two of main features of concurrent logic programming, 
namely the asynchronous character of the communication and the monotonic 
update of the store. During last years some work have been done to lift both 
features. On the one hand, de Boer et al. [5] has proposed non-monotonic updates 
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of the store and have studied compositional and fully abstract semantics for 
them. On the other hand, Saraswat proposed a synchronisation mechanism in 
([10]). Briefly, his proposal and many related ones (e.g. [6]) are based on a coding 
of an explicit operator to achieve synchrony. This should be contrasted with the 
classical concurrent constraint framework in which asynchronous communication 
is simply obtained by the blocking of ask primitives when information on the 
store is not complete enough to entail the asked constraints. Following these 
lines, a natural alternative to obtain synchronous communication in CCP is to 
force ask and tell primitives to synchronise in some way. We elaborate further 
on this idea by proposing new versions of these primitives. 

As in the classical CCP framework, our proposal makes use of tell and ask 
primitives. However, a new perspective is taken in that, to be reduced, any tell{c) 
operation needs an ask(c') partner. Restated in other terms, if the tell primi- 
tives are seen as producers of new information and ask primitives as consumers, 
the new primitives consist of lazy tell (or “just-in-time”) producers forced to 
synchronise on their consumer asks. Stress is put on the novelty of informa- 
tion, i.e. on the fact that the told information should not be entailed by the 
current store. Consequently, any tell{c) and ask{c) operations whose constraint 
argument c is entailed by the current store can proceed autonomously. 

The general scheme is enriched by permitting the synchronisation of more 
than two partners. Futher we allow some of the tell primitives not to update 
the store. These primitives are subsequently called fictitious and are denoted as 
ftell. They can be used to transmit the information which is not (yet) entailed 
by the (global/local) store - quite important possibility in distributed systems. 

Comparing to the works cited above for synchronisation, we notice the advan- 
tage of our approach is that it permits the specification of on what information 
the synchronisation should be made, rather than with whom. Our synchroni- 
sation is thus more data-oriented as opposed to process-oriented (however still 
keeping possibility to specify the latter approach as a derived operator). An 
interesting consequence from a software engineering point of view is that in a 
specification of an agent (process) it is not necessary to know in advance with 
which other agents synchronisation should take place. Modularity is thus gained. 

Our aim is not to present a new programming language but rather to in- 
troduce new variants of tell and ask primitives, to justify our proposal via a 
comparison with [4] (demonstrating expressiveness), and to present a semantics 
for them. To sum up our approach allows each agent (i) to maintain its local 
private store, (ii) to share (read/write) global information in its global stores 
hierarchy and (iii) to communicate (via multi-party or hand-shakes) with other 
agents. To achieve this we largely employ standard CCP constructs for hiding 
of local variables (3x) and parameter passing {dxy) as well - see Definition 1. 

The rest of this paper is organised as follows. In Section 2 we present the 
syntax and an informal semantics of See. To justify our proposal we compare it 
with the language of [4] in Section 3, while in the Section 4 we give an operational 
(SOS rules and final result) semantics. We conclude by summarising full semantic 
treatment of See and by suggesting some future research directions. 
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2 Language See 

This section presents the syntax and the informal semantics of the language 
underlying the See paradigm, also called See. As in [11], the constraint system 
underlying See language consists of any system of partial information that sup- 
ports the entailment relation. We assume a given cylindric constraint system 
(C, h) over a set of variables Svar, defined as usual from a simple constraint 
system {D, h) as follows. 

Definition 1. Let Svar be a denumerable set of variables (denoted by x,y,. . .) 
and let {D,\~) be a simple eonstraint system. Let Vf{D) denote the set of finite 
subsets of D. For each variable x € Svar a function 3^ : Vf{D) 'Pf(D) is 
defined such that for any c,dG Vf{D) the conditions {Ef) to {E 4 ) are satified. 
Moreover, for each x,y € Svar the elements dxy G D are diagonal elements iff 
they satisfy the conditions {Ef) to {E-j). 

{Ei)c\-3x{c) {Ef)%\-dxx 

{E 2 ) c\- d implies 3x{c) h 3a;((i) {Ef) {dxy} ~ 3z{{dxz,dzy}) whenever x ^ y 
{E 3 ) 3x{c A 3a;((i)) ~ 3x{c) A 3x{d) {E’j) {dxy} A 3a;(c A {dxy}) b c 
(A 4 ) 3,(3,(c))~3,(3,(c)) 

Then {V{D) /^,\~) is a cylindric constraint system (over Svar). We denote 3x{c) 
by 3xC, and for a set X = {x \, . . . , Xn}, we denote . . . 3x„c by 3xc. 

The language description is parametric with respect to (C, h), and so are the 
semantic constructions presented. 

We use G possibly subscripted to range over the set Sgoal (processes), c,d,... 
to range over basic constraints (i.e. constraints which are equivalent to a finite 
set of primitive constraints), and X,Y, . . . to range over subsets of Svar. 
Processes G G Sgoal are defined by the following grammar 

G::=A \ ask(c) | tell(c) | ftell(c) | G; G | G -f G | G || G | 3^G | p{t) 

Let us briefly discuss an informal meaning of our language constructs. A con- 
stant A denotes a successfully terminated process. The atomic constructs ask(c) 
and tell(c) act on a given store in the following way: as usual, given a con- 
straint c, the process ask(c) succeeds if c is entailed by the store, otherwise 
it is suspended until it can succeed. However, the process tell(c), of a more 
lazy nature than the classical one, succeeds only if c is (already) entailed by 
the store and in this case it does not modify the store, and suspends otherwise. 
It is resumed by a concurrently suspended ask(d) operation provided that the 
conjunction of c and of the store entails d. In that case, both the tell and the 
ask are synchronously resumed and the store is atomically augmented with the 
constraint c at the same time. The atomic construct ftell(c) behaves as tell 
with the exception that the store is not augmented with the constraint c. 

The sequential composition Gi; G 2 and the nondeterministic choice Gi -I- G 2 
have standard meanings (the latter being a global as the selection of a component 
can be influenced by the store and by the environment of the process as well) . 
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The parallel composition Gi || G 2 represents both the interleaving (merge) of 
the computation steps of the components involved (provided they can perform 
these steps independently of each other) and also synchronisation: this is the case 
of the tell, ftell and ask described above. Note that in the general case there 
can be a parallel composition of a finite sets of tell’s and ftell’s and a finite 
set of ask’s such that store and a conjunction of tell and ftell constraints 
entails ask constraints. In this case all the components synchronise. Sometimes 
this is called a multi-party synchronous communication. 

The block construct 3xG behaves like a process G with the variables in X 
considered as local. It hides the information about variables from X within the 
process G. Finally, p{t) is a procedure call, where p is the name of a procedure 
and f is a list of actual parameters. Its meaning is given w.r.t. a set of proce- 
dure declarations or program; each such a declaration is a construct of the form 
p{xi, . . . , Xn) ■ — G, where xi, . . . , Xn are distinct variables and G is a goal. 

Finally, we note it is quite easy to recover the traditional concurrent con- 
straint paradigm within our framework by the introduction of an asynchronous 
tell. This can be specified by providing, for each constraint to be told, a concur- 
rent corresponding ask operation. Hence this derived operator atell (standing for 
an asynchronous tell) can be defined as 

atell(c) : — tell(c) || ask(c). 

Note the simulation of our primitives by the old ones is not so straightforward 
and involve auxiliary tells and asks as well as the coding of a manager. 



3 Specification of Multi-agent Systems in See 

In [4] a multi-agent programming language (we will refer to it in this paper 
as MAL) has been introduced. In this section we show how the exchange of 
information in multi-agent systems can be defined in See. To this end we rep- 
resent expressions from MAL in the framework of See. We want to justify that 
our language See can be seen as a formal multi-agent programming language as 
well. Furthermore, we show that some aspects of behavior of multi-agent systems 
which cannot be covered by MAL have their (simple) specifications in See. The 
definition of the language MAL is taken from [4]. 

In the following definitions we assume a given set Chan of communication 
channels, with typical elements a, and a set Proc of procedure identifiers, with 
typical elements p. We also suppose that the set of variables Svar is divided into 
two disjoint subsets Svar=ChanVarUAgentVar. Typical elements of ChanVar 
are w, typical elements of AgentVar are x,y. The variables from ChanVar will 
be used to model communication via channels, while AgentVar is the set of 
agent’s variables. We also suppose that the agent’s variables are split into local 
and global ones. This is because in MAL there is a global store that is distributed 
over the agents. Each agent has direct access only to its private store. Information 
in the private store about the global variables can be communicated to the other 
agents. The local variables of an agent cannot be referred to in communications. 
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Definition 2 (Basic actions). Given a cylindrical constraint system (C, h) the 
basic actions of the programming language MAL are defined as follows: 

a ::= ale \ ale \ ask{e) \ tell{c) 

The execution of the output action ale consists of sending the information c 
along the channel a, which has to synchronize with a corresponding input aid, 
for some d with c h d. In other words, the information c can be sent along a 
channel a only if some information entailed by c is requested. The execution 
of an input action aid, which consists of receiving the information c along the 
channel a, also has to synchronize with a corresponding output ale, for some 
c with e h d. The execution of a basic action ask{e) by an agent consists of 
checking whether the private store of the agent entails c. On the other hand, the 
execution of tell{e) consist of adding c to the private store. 

Representing Basic Actions in See 

With each channel a we associate a variable Wa from ChanVar. The actions 
ask{c) and tell{e) behave equally in both languages, hence are represented by 
the same expressions. Sending information along a channel a is modeled by 
the See action ftell(wo, = true A c). This action has to synchronize with the 
corresponding ask action. As ftell does not update information on the store, 
the corresponding ask must be sequentially followed by an asynchronous tell 
action which will actually store the information. Hence, receiving of information 
is modeled as a sequence ask('u;Q, = true A c);atell(c). The representation of 
MAL basic actions in the See is summarized in the following table. 



MAL 


See 


ask{e) 


ask(c) 


tell{e) 


tell(c) 


ale 


ftell(r<;a = true A c) 


ale 


ask(rcQ, = true A c); atell(c) 



Definition 3 (Statements). MAL agents (statement S) are defined as: 

S ::= a.S | 5i + ^2 | | 3,5 | p{x) 

Statements are thus built up from the basic actions using the following standard 
programming constructs: action prefixing (denoted by “.”), non-deterministic 
choice (denoted by “+”), internal parallelism (denoted by “&”), local variables 
(denoted by 3,5, which indicates that a: is a local variable in 5), and (recursive) 
procedure calls of the form p{x), where x denotes a sequence of variables which 
constitute the actual parameters of the call. 

Representing Statements in See 

With the exeption of prefixing, all the statements are directly represented by the 
corresponding See expressions. Prefixing is modeled by sequential composition. 
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MAL 


See 


a.S 


a; S 


S 1 + S 2 


S 1 + S 2 


SikS2 


Si II ^2 


3.S 




p{x) 


p{x) 



Definition 4 (Multi-agent systems). A multi-agent system A of MAL is as 
A::=<D,S,a>\Ai || A2\5h{A) 

A basic agent in a multi-agent system is represented by a tuple < D, S,c > 
consisting of a set D of procedure declarations of the form p(x) S, where x 
denote the formal parameters of p and S denotes its body. The statement S 
in < D,S,c > describes the behavior of the agent with respect to its private 
store c. The threads of S, i.e. the concurrently executing sub-statements of S, 
interact with each other via the private store of the basic agent by means of the 
actions ask{d) and tell{d). Additionally, a multi-agent system itself consists of a 
collection of concurrently operating agents that interact with each other only via 
a synchronous information-passing mechanism by means of the communication 
actions aid and aid. (In [4] authors provide the parallel composition of agent 
systems only; the semantic treatment of sequential and the non-deterministic 
composition of agent systems is standard.) 

Representing Multi-agent Systems in See 

The parallel operator || is represented as the asynchronous parallel operator || 
of See. The operator Sh{A) is represented as The encapsulation is thus 

achieved by making the channels from H local. The communication among con- 
currently operating agents is achieved by synchronous communication mecha- 
nism of See. In particular, the pair ask, ftell allows to synchronously commu- 
nicate information between two agents without storing information on the global 
store. On the other hand the pair ask, tell allows for multi-agent communica- 
tion among several process and with storing the communicated information. We 
summarize the translation in the following table. 



MAL 


See 


< D, S,c > 


See program 


Al II A 2 


Al II A 2 


Sh{A) 


^WH (^) 



Global Multi-agent Communication 

In contrast to MAL our See language allows asynchronous and synchronous 
multi-agent communication. Besides a synchronous ftell action, a See agent 
can also perform tell, ftell and ask actions on the global store. If an agent 
uses atell then it just communicates some piece of information to all processes. 
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i.e. it makes information generally accessible. If an agent uses a synchronous 
tell action on the global store, then there must be at least one agent waiting 
for this information and the communication is synchronous in this case. However, 
as information is stored into the global store in this case it would be accessible 
to any agent. This more general way of transmitting information among agents, 
makes See more general and more flexible language for specification and imple- 
mentation of the exchange of information in multi-agent systems as is MAL. 



4 Operational Semantics O of See 

Contexts 

It turns out that it is possible to treat the sequential and parallel composition 
operators of See in a very similar way by introducing the auxiliary notion of 
context. Basically, a context consists of a partially ordered structure where place 
holders (subsequently referred to by □) have been inserted at a top-level place, 
i.e. a place not constrained by the previous execution of other atoms. Viewing 
goals as partially ordered structures too, the ask and tell primitives to be reduced 
are those which can be substituted by a place holder □ in a context. Furthermore, 
the goals resulting from the reductions are essentially obtained by substituting 
the place holder by the corresponding clause bodies or the A, depending upon 
whether an atom or a ask/tell primitive is considered. 

Definition 5. Contexts are functions inductively defined on goals as follows: 

1. A nullary context is associated with any goal. It is represented by the goal 
and is defined as the constant mapping from Sgoal to this goal with the goal 
as value. 

2. O is a unary context that maps any goal to itself. For any goal G, this 
application is subsequently referred to as □[G]. Thus n[G] = G for any goal. 

3. If tc is an n-ary context and if G is a goal, then {tc; G) is an n-ary context. 
Its application is defined as follows : for any goals Gi, . . . , Gn, 

(te;G)[Gi,--- ,G„] = (te[Gi,--- ,G„];G) 

4-. If tci and tc 2 are m-ary and n-ary eontexts then tc\ || tc 2 is an (m-hn)-ary 
context. Its application is defined as follows: for any goals Gi, . . . , Gm+n, 

{tc\ II tC 2 )\Gi, • • • , Gm+n] = {tcfiGi, • • • , Gm]) || (tC2[Gm+l, ' ' ' , Gm+n]) 

In what follows the goals are considered modulo syntactical congruence in- 
duced by associativity of “||” and “-I-”, by commutativity of “||” and “-I-”, 
and A as the unit element. Also we will simplify the goals resulting from the 
application of contexts accordingly. 

Transition System 

The operational semantics of See is defined in Plotkin’s style ([8]) by means of 
a transition system, which is itself defined by rules of the form 




208 



Lubos Brim et al. 



Assumptions p ^ j , • 

-7-i -f—- if Conditions 

Conclusion ■' 

where Assumptions and Conditions may possibly be absent. Configurations tra- 
ditionally describe the statement to be computed and a state summing up the 
computations made so far. Rephrased in the See context, the configurations to 
be considered here comprise a goal to be reduced together with a store. In the 
following definition Sstore denotes the set of stores. 



Definition 6. The transition relation — >■ is defined as the smallest relation of 
{Sgoal X Sstore) x {Sgoal x Sstore) satisfying the rules^ of Figure 1. We write 
<G, a> — >■ <G', a'> rather than {<G, a>, <G', a'>) G— >■. 



(T) <tc[spi,- ■ ■ , spm],cr> ->• <tc[A, ■■■ , Zi],r> 



*/< 



{spi, • • • , spm} = { ask(ai), ■ ■ ■ ,ask{ap), 
tell{ati) , ■ ■ ■ ,tCl{atq), 
tell{rti), • • • , tell{rtr), 
ftell{afi),- ■ ■ ,ftell{afs), 
ftell{rfi),-- - ,ftell{rft) }, 
a U {rti, ■ ■ ■ ,rU} U {r/i,- • • ,rft} 

h {fli, • • • , Op} U {ati,- ■ ■ , atq} U {a/i, • • • , afs}, 
there is no strict subset S of {rti, • • • ,rtr}U {rfi, • • • , rft} 
such that (7 U S' h {ai, • • • , Op} U {ati, ■ ■ ■ , atq} U {a/i, • • • , afs}, 
r = CT U {rti , • • • , rtr} , m > 0 



> 



Fig. 1. See transition rules for new versions of aks and tells 



The Operational Semantics 

Rules for the sequential and parallel composition operators are tackled by means 
of the notion of context within the rule (T). 

The rule (T) defines reductions of tell, f tell and ask primitives. The primi- 
tives to be reduced, referred to as spi, . . . , spm, are partitioned in five categories: 
(1) the ask primitives (the multi-set {asfc(ai),--- , ask(ap)}), 
the tell primitives split into (2) those which add information to the store (the 
multi-set (tell(rti), ■ ■ ■ ,tell(rtr)j) and (3) those which do not (the multi-set 
{tell(ati), ■ ■ ■ ,tell(atq)}, i.e. already entailed ), and 

the fictitious primitives ftell split in a similar way into (4) the multi-sets 
{ftell{rfi), - ■ ■ ,ftell{rft)} and (5) {ftell{afi), ■ ■ ■ , ftell{afs)}, respectively. 

All these primitives are then simultaneously reduced to the empty goal A 
when information on the current store (cr) together with new information told 
(rti, . . . , rtr, ffi, . . . , rft) entails information of the other primitives. The new 

^ Please note that due to lack of space we do not give the very standard rules for a 
nondeterministic choice, hiding and a procedure call in Figure 1 
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store consists in this case of the old store enriched by new information told. Note 
that this rule reflects the laziness feature of our tell primitives. 

An ask(c) primitive for a constraint c entailed by the current store a can be 
reduced alone following rule (T) by taking the unary context □, m = 1, p = 1, 
q = 0, r = 0, s = 0, t = 0. The axiom 

<ask(c), (J> — >■ <Z\, cr> if {ct h c} (1) 

results from rule (T) as the particular case. 

A tell(c) primitive for a constraint c entailed by the current store a can be 
reduced alone following rule (T) by taking the unary context □, m = 1, p = 0, 
<7 = 1, r = 0, s = 0, t = 0. The axiom 

<tell(c), (J> — >■ <Z\, (T> if {a h c} (2) 

results from rule (T) as the particular case as well. 

Other tell’s and ask’s need each other for reduction and reduce simultane- 
ously. A minimality condition (see the side condition of (T) ) is required to forbid 
outsider tell’s to be reduced by taking advantage of a concurrent reduction. 

To define the operational semantics we follow the logic programming tradi- 
tion ~ it specifies the final store of the successful computations. It also indicates 
those stores corresponding to deadlock situations and distinguishes between two 
types: failure corresponding to the absence of suitable procedure declarations to 
reduce procedure calls and suspension corresponding to the absence of suitable 
data on the store or of concurrent processes that would allow tell and ask prim- 
itives to proceed, i.e.to suspended tell’s and ask’s. Note that, as illustrated by 
axioms (1) and (2) above, the two situations may be distinguished by a simple 
criterion: the existence of a store richer than the current one that would enable 
the computation to proceed. The following definition is based on this intuition. 
The symbols 5“*', 6~ , and are used to indicate the computations ending by a 
success, a failure, and a suspension, respectively. 

Definition 7. Operational semantics O : Sgoal — >■ V{Sstore x {<5+, i5®, i5“}) is 
defined as the following function: for any goal G, 

0{G) = { <T, : <G, true> —>■•••—>■ <A, t > } 

U { <T, > : <G, true> —>■•••—>■ <G' , r> t4, where G' A and 

there are a' ,G" ,a” such that <G',a'> — >■ <G",a">} 

U { <T, 5~ > : <G, true> —>■•••—>■ <G', t> t4, where G' A and 
for any a' , <G', a'> -fir } 



5 Conclusions 

We have presented a language for a specification of the exchange and/or the 
global sharing of information in multi-agent systems. It is solely based on con- 
current constraint programming paradigm with slightly modified test and up- 
dates operations. We have briefly compared it to the latest proposal within this 
area as given in [4]. 

Due to a lack of space we have presented an operational semantics only. 
Reader can easily verify it is not compositional. We refer to our studies in [2], 
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where a compositional semantics is given and proved to be correct with respect to 
the semantics O (it is based on ‘hypothetical’ steps which can be made by both 
concurrent agents and the state of global store rather than successive updates) . 
In [3] an algebraic (failure) semantics is defined for a subset of See (only finite 
behaviours and without f tells). The algebraic semantics is proved to be sound 
and complete with respect to a compositional operational semantics. Allowing 
handshake communications only in the mentioned subset, we proposed [1] a de- 
notational semantics. We employed so-called testing techniques - this semantics 
uses monotonic sequences of labelled pairs of input-output states, possibly con- 
taining “hypothetical” gaps, and ending with marks reporting success or failure 
(to follow logic programming tradition). This semantics is proved to be correct 
with respect to the operational semantics and fully abstract as well. 

Our future work aims at designing fully abstract semantics for the full version 
of See. Also we study possibilities of incorporating (discrete) real-time aspects. 
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Abstract. Scalable Distributed Data Structures (SDDS) are access 
methods specifically designed to satisfy the high performance require- 
ments of a distributed computing environment made up by a collection 
of computers connected through a high speed network. In this paper we 
propose an order preserving SDDS with a worst-case constant cost for 
exact-search queries and a worst-case logarithmic cost for update queries. 
Since our technique preserves the ordering between keys, it is also able to 
answer to range search queries with an optimal worst-case cost of 0{k) 
messages, where k is the number of servers covering the query range. 
Moreover, our structure has an amortized almost constant cost for any 
single-key query. 

Hence, our proposal is the first solution combining the advantages of 
the constant worst-case access cost featured by hashing techniques (e.g. 
LH*) and of the optimal worst-case cost for range queries featured by 
order preserving techniques (e.g., RP* and DRT). Furthermore, recent 
proposals for ensuring high-availability to an SDDS can be easily com- 
bined with our basic technique. Therefore our solution is a theoretical 
achievement potentially attractive for network servers requiring both a 
fast response time and a high reliability. 

Finally, our scheme can be easily generalized to manage fc-dimensional 
points, while maintaining the same costs of the 1-dimensional case. 

Keywords: scalable distributed data structure, message passing envi- 
ronment, multi-dimensional search. 



1 Introduction 

With the striking advances of communication technology, distributed computing 
environments become more and more relevant. This is particularly true for the 
technological framework known as network computing: a fast network intercon- 
necting many powerful and low-priced workstations, creating a pool of perhaps 
terabytes of RAM and even more of disc space. This is a computing environment 
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very apt to manage large amount of data and to provide high performances. In 
fact, the large amount of RAM collectively available combined with the speed 
of the network allow the so-called RAM data management, which can deliver 
performances not reachable using standard secondary memory. 

A general paradigm to develop access methods in such distributed environ- 
ments was proposed by Litwin, Neimat and Schneider [9]: Scalable Distributed 
Data Structures (SDDSs). The main goal of an access method based on the 
SDDS paradigm is the management of very large amount of data implement- 
ing efficiently standard operations (i.e. inserts, deletions, exact searches, range 
searches, etc.) and aiming at scalability, i.e. the capacity of the structure to keep 
the same level of performances while the number of managed objects changes 
and to avoid any form of bottleneck. In particular, a typical distributed struc- 
ture made up by a set of data server and a unique server directory cannot be 
considered an SDDS. 

The main measure of performance for a given operation in the SDDS 
paradigm is the number of point-to-point messages exchanged by the sites of 
the network to perform the operation. Neither the length of the path followed 
in the network by a message nor its size are relevant in the SDDS context. Note 
that, some variants of SDDS admit the use of multicast to perform range query. 

There are several SDDS proposals in the literature: defining structures based 
on hashing techniques [3,9,12,15,16], on order preserving techniques [1,2,4,8,10], 
or for multi-dimensional data management techniques [11,14], and many others. 

LH* [9] is the first SDDS that achieves worst-case constant cost for exact 
searches and insertions, namely 4 messages. It is based on the popular linear 
hashing technique. However, like other hashing schemes, while it achieves good 
performance for single-key operations, range searches are not performed effi- 
ciently. The same is true for any operation executed by means of a scan involving 
all the servers in the network. 

On the contrary, order preserving structures achieve good performances for 
range searches and a reasonably low (i.e. logarithmic), but not constant, worst- 
case cost for single key operations. Among order preserving SDDSs, we recall 
RP*s [10], based on the B+-tree technique and BDST [4], based on balanced 
binary search tree. Both these structures achieves logarithmic costs for single key 
operations in the worst-case. Structures in the DRT family [5,8] can guarantee 
only a linear bound in the worst-case, but provide very good performances in the 
amortized case [5]. Finally, Distributed B+-tree [2] is the first order preserving 
structures with constant exact search worst-case cost, but at the price of a linear 
worst-case cost for insertion. 

Here we further develop the technique presented in [2] with the major objec- 
tive to keep logarithmic the worst-case cost of insertions. This allows to obtain 
the following results: (i) worst-case constant cost for exact searches and inser- 
tions that do not causes splits, namely 4 messages; (ii) worst-case logarithmic 
cost for insertions that causes splits; (iii) amortized almost constant cost for any 
single-key operations. 
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Therefore, this is the first order preserving SDDS proposal achieving single- 
key performances comparable with the LH*, while continuing to provide the 
good worst-case complexity for range searches typical of order preserving access 
methods, like RP* and DRT. 

Our structure is also able to support deletions: these are not explicitly con- 
sidered in previous proposals in the literature, but for BDST [4] and, to some 
degree, for LH* [12]. Moreover, the technique used in our access method can be 
applied to the distributed k-d tree [14], an SDDS for managing fc-dimensional 
data, with similar results. 



2 ADST 

We now introduce our proposal for a distributed search tree, that can be seen as 
a variant of the systematic correction technique presented in [2] . We first present 
the basic technique and then discuss our variation. 

Each server manages a unique bucket of keys. The bucket has a fixed capacity 
b. We define a server “to be in overflow” or “to go in overflow” when it manages 
b keys and one more key is assigned to it. When a server s goes in overflow it 
starts the split operation. After a split, s manages | keys and ^ + 1 keys are 
sent to a new server Snew It is easy to prove the following property: 

Lemma 1. Let a be a sequence of m intermixed insertions and exact searches. 
Then we may have at most splits, where A = |. 

Moreover, clients and servers have a local indexing structure, called local 
tree. This is needed to avoid clients and servers to make address errors. From 
a logical point of view the local tree is an incomplete collection of associations 
{server, interval of keys): for example, an association (s,I{s)) identifies a server 
s and the managed interval of keys /(s). 

For further details on buckets and local trees management see [2,8]. 

Let us consider a split of a server s with a new server s'. Given the leaf / 
associated to s, a split conceptually creates a new leaf /' and a new internal 
node V, father of the two leaves. This virtual node is associated to s or to s'. 
Which one is chosen is not important: we assume to associate it always with the 
new server, in this case s', s stores s' in the list I of servers in the path from the 
leaf associated to itself and the root, s' initializes its corresponding list V with a 
copy of the s' one (s' included). 

Moreover if this was the first split of s, then s identifies s' as its basic server 
and stores it in a specific field. Please note that the interval I{v) now corresponds 
to the basic interval of s. 

After the split s sends a correction message containing the information about 
the split to s' and to the other servers in 1. Each server receiving the message 
corrects its local tree. Each list I of a server s corresponds to the path from the 
leaf associated with s to the root. 

This technique ensures that a server s„ associated to a node v knows the 
exact partition of the interval I{v) of v and the exact associations of elements 




214 Adriano Di Pasquale and Enrico Nardelli 



of the partition and servers managing them. In other words the local tree of Sy 
contains all the associations (s',/(s')) identifying the partition of I{v). Please 
note that in this case I{v) corresponds to I{lt{sv)). 

This allows to forward a request for a key belonging to I{v) (i.e. a request 
for which Sy is logically pertinent) directly to the right server, without following 
the tree structure. In this distributed tree, rotations are not applied, then the 
association between a server and its basic server never changes. 

Suppose a server s receives a requests for a key k. If it is pertinent for the 
requests (fc G I{s)) then it performs the request and answers to the client. 
Otherwise if it is logically pertinent for the requests (fc G I{lt{s))) then it finds 
in its local tree lt{s) the pertinent server and forwards it the requests. Otherwise 
it forwards the requests to its basic server s'. We recall that I{lt{s')) corresponds 
to the basic interval of s, then, as stated before, if the request for k is arrived to 
s, k has to belong to this interval. Then s' is certainly logically pertinent. 

Therefore a request can be managed with at most 2 address errors and 4 
messages. 

The main idea of our proposal is to keep the path between any leaf and the 
root short, in order to reduce the cost of correction messages after a split. To 
obtain this we aggregate internal nodes of the distributed search tree obtained 
with the above described techniques in compound nodes, and apply the technique 
of the Distributed B+-tree to the tree made up by compound nodes. For this 
reason we call our structure ADST (Aggregation in Distributed Search Tree). 

Please note that the aggregation only happens at a logical level, in the sense 
that no additional structure has to be introduced. What happens in reality is 
simply that a server associated to a compound node maintains the same infor- 
mation maintained by the one associated to an internal node in the Distributed 
B+-tree. 

Each server s in ADST is conceptually associated to a leaf /. Then, as a 
leaf, s stores the list I of servers managing compound nodes in the path from / 
and the (compound) root of the ADST. If s has already split at least one time, 
then it stores also its basic server s' . In this case s' is a server that manages a 
compound node and such that I{lt{s')) contains the basic interval of s. 

Any server records in a field called adjacent the server managing the adjacent 
interval on its right. Moreover, if s manages also a compound node va{s), then 
it also maintains a local tree, in addition to the other information. 

2.1 Split Management 

Let s be a server conceptually associated to a leaf /, and let the father compound 
node va* of / be managed by a server s*. Let us now consider a split of s with 
Snew as new server. The first operation performed by s is to send correction 
messages to each server in 1. 

Then, exactly like in the technique described for distributed B+-tree [2], a 
new leaf f' and a new internal node v father of the two leaves are conceptually 
created, v is associated to Snew 
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In ADST two situations are possible: 

— The node v has to be aggregated with the compound node va* . Then v is 
released, s does not change anything in its list I and Snew initializes its list 
I new with a copy of /. If this was the first split of s, then s identifies s* as 
its basic server and stores it in a specific field. Please note that the interval 
I{lt{s*)) contains the basic interval of s. 

— The node v has not to be aggregated with the compound node va* . Then 
a new compound node va is created as a son of va* aggregating the single 
internal node v. Snew is called to manage va. s changes its list I adding Snew- 
Snew initializes its list Inew with a copy of 1. If this was the first split of 
s, then s identifies Snew as its basic server and stores it in a specific field. 
Please note that the interval I{lt{snew)) is now exactly the basic interval of 
s. 

The field adjacent of Snew is set with the value stored in the field of s. The field 
adjacent of s is set with Snew (see figure 1). 




I{va{s„,.,)) 



Fig. 1. Before (left) and after (right) the split of server s with Snew as new server. 
Intervals are modified accordingly. Correction messages are sent to server managing 
compound nodes stored in the list s.l and adjacent pointers are modified. Since the 
aggregation policy decided to create a new compound node and Snew has to manage 
it, then Snew is added to the list s.l of servers between the leaf s and the compound 
root nodes, Snew sets Snew.l = s.l. If this is the first split of s, then s sets as its 
basic server 



2.2 Aggregation Policy 

The way to create compound nodes in the structure is called aggregation policy. 
We require that an aggregation policy creates compound nodes so that the height 
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of the tree made up by the compound nodes is logarithmic in the number of 
servers of the AD ST. In such a way the cost of correcting the local trees after a 
split is logarithmic as well. 

One can design several aggregation policies, satisfying the previous require- 
ment. The one we use is the following. 

(AP): To each compound node va a bound on the number of internal nodes 
l{va) is associated. The bound of the root compound node ra is l{ra) = 1. If the 
compound node va' father of va has bound l{va'), then l{va) = 2l{va') + I. 

Suppose a server associated to a leaf son of va splits. If the bound l{va) is 
not reached, va aggregates the new internal nodes v. Otherwise a new compound 
node has to be created as a son of va and aggregating v. 

It is easy to prove the following: 

Invariant 1 Let vag be the compound root node. Let vao, vai , ..., vak be the com- 
pound nodes in the path between vao ci'nd a leaf. Then #internaLnodes(uai) = 
l{vai), for each 0 < i < k — 1, and #internal_nodes(uafc) < l{vak). 

With reference to figure 2, we have for example: /(a) = A, L{b) = 
B, and so on; a.adjacent = b, b.adjacent = c, c.adjacent = q, and 
so on; a.l = {a}, b.l = {c,a}, c.l = {q,c,a}, and so on; lt{f) = 

{{d, D ) , (/, F ) , {g, G ) , {h, H ) , (i, L ) , (I, L ) , (o, O ) , (p, P ) , (m, M) , (n, TV)} and 
then I{va{f)) = I{lt{f)) = DUFUGUHULULUOUPUMUN] from 
the given sequence of splits, a. basic server = a, b.basicserver = c, c.basicserver 
= c, and so on. 

Theorem 2. Aggregation policy AP guarantees that the length of any path be- 
tween a leaf and the compound root node is bounded by ha = k < [lognj -I- 1, 
where n is the number of internal nodes of the distributed tree. 

Proof. Let us consider a generic leaf /, and let ha = k he its height in the tree 
of compound nodes. Then there are k compound nodes vao,vai, ..,vak-i in the 
path between / and the root compound node, and vao is .just the compound 
root node. It follows directly from the definition of policy AP that: #inter- 
naLnodes(vao) = l{vao) = 1, #intemaLnodes{vai) = l{va\) = 2^ -|- 1, #in- 
ternaLnodes{va 2 ) = l{va 2 ) = 2^ -I- 1, ..., #internaLnodes{vak- 2 ) = K'^^O'k- 2 ) = 
2 ^“ 2 _|_ and #internaLnodes{vak-i) > 1. Then the number n of internal nodes, 
in the case fc > 1, is such that: 



k-2 

n>l + ^(2* + l) + l = k-2 + 
2=1 



2fe-i _ 
2 - 1 



-f 1 = 



k-2 + 2'^-^ 



^n> 2'^-\\fk > 1 



Then we have ha = k < [log nj -I- 1 . 
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Fig. 2. An example of ADST with policy AP. Lower-case letters denote servers and 
associated leaves, upper-case letters denote intervals of data domain. The sequence of 
splits producing the structure isa— >-6— >e, then d— >/— 
m — >■ n, then 1 — >■ o — >■ p, then c —>■ q and finally e — >■ r, meaning with x —>■ y that the 
split of X creates the server y 



We recall that for the binary tree we are considering, that is a tree where 
every node has 0 or two sons, the number of leaves, and then of servers, is 
n = n' + 1, where n' is the number of internal nodes. Substantially the previous 
theorem states that the cost of correcting local trees after a split is of O(logn) 
messages, where n is the number of servers in the ADST. 



2.3 Access Protocols 

We now analyze what happens when a client c has to perform a single-key request 
for a key k. We describe the case of exact search, insertions and deletion: 

Exact Search: c looks for the pertinent server for k in its local tree, finds 
the server s, and sends it the request. If s is pertinent, it performs the request 
and sends the result to c. 

Suppose s is not pertinent. If s does not manage a compound node, then it 
forwards the request to its basic server s'. We recall that I{lt{s')) includes the 
basic interval of s, then, as stated before, if the request for k is arrived to s, 
k has to belong to this interval. Therefore s' is certainly logically pertinent: it 
looks for the pertinent server for k in its local tree and finds the server s" . Then 
s' forwards the request to s", which performs the request and answers to c. In 
this case c receives the local tree of s' in the answer, so to update its local tree 
(see figure 3). 

Suppose now that s manages a compound node. The way in which compound 
nodes are created ensures that I{lt{s)) includes the basic interval of s itself. Then 
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s has to be logically pertinent, hence it finds in lt{s) the pertinent server and 
sends it the request. In this case c receives the local tree of s in the answer. 




answer + lt(s ') 



Fig. 3. Worst-case of the access protocol 



Insertion: the protocol for exact search is performed in order to find the 
pertinent server s for k. Then s inserts k in its bucket. If this insertion causes s 
to go in overflow then a split is performed. After the split, correction messages 
are sent to the servers in the list I of s. 

Deletion: the protocol for exact search is performed in order to find the 
pertinent server s for k. Then s deletes k from its bucket. If this deletion causes 
s to go in underflow then a merge, which is the opposite operation of a split, is 
performed. 

We consider a server to be in underflow whenever it manages less than ^ keys 
in the bucket, for a fixed constant d. The merge operation consists basically in 
releasing an existing server s which is in underflow. If s is not empty, it sends its 
remaining keys to another server s' . From now on I {s') is enlarged by uniting 
it with I{s). After the merge, correction messages are sent to the servers in list 
I of s. Moreover, an algorithm to preserve the invariant of policy AP is applied 
after a merge. A detailed presentation of this algorithm is quite long, and is left 
to the extended version of this paper. Please note that s' is only chosen if it is 
able to receive all the keys of s without going in overflow. If there is not such a 
server, a server s*, whose interval is adjacent to I{s), is chosen and it sends a 
proper number of keys to s in order to allow s to exit from the underflow state. 
I{s) and I{s*) are modified accordingly. 

Previous SDDSs, e.g LH*, RP*, DRT, etc., do not explicitly consider dele- 
tions. Hence, in order to compare ADST and previous SDDSs performances, we 
shall not analyze behavior of ADST under deletions. 
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2.4 Range Search 

We now describe how a range search is performed in ADST. 

The protocol for exact search is performed in order to find the server s 
pertinent for the leftmost value of the range. If the range is not completely 
covered by s, then s sends the request to server s' stored in its field adjacent, s' 
does the same. Following the adjacent pointers all the servers covering the range 
are reached and answer to the client. The operation stops whenever the server 
pertinent for the rightmost value of the range is reached, see figure 4. 

The above algorithm is very simple and applies to the case when only the 
point-to-point protocol is available. Other variants can be considered, for exam- 
ple a client can send more than one request messages, if it discovers from its 
local tree that more than one server intersects the range of the query. 

Usually whenever the range of a query is large, the multicast protocol is 
applied, if it is available in the considered technological framework. The same 
technique can be applied for ADST as well. 




answers 



Fig. 4. Worst-case of the range search. The server s is pertinent for the leftmost value 
of the range 



3 Complexity Analysis 

In what follows we suppose to operate in an environment where clients work 
slowly. More precisely, we suppose that between two consecutive requests the 
involved servers have the time to complete all updates of their local trees: we 
call this a low concurrency state. Any communication complexity result in SDDS 
proposals is based on this assumption. In the extended version of this paper [6] 
we show how the complexity analysis of SDDSs is influenced by this assumption, 
and we give some ideas on how to operate in the case of fast working clients (also 
called high concurrency). 
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3.1 Communication Complexity 

The basic performance parameter for an SDDS is the communication complexity, 
that is the number of messages needed to perform the requests of clients. We 
present the main results obtained with ADST for this parameter. 

Theorem 3. An exact search has in an ADST a worst-case cost of 4 messages. 
An insertion that does not cause a split and a deletion that does not cause a 
merge has also in an ADST a worst-case cost of 4 messages. 

Proof. Follows directly from the presented algorithms. 

Theorem 4. A split and the following corrections of local trees have in an 
ADST a worst-case cost o/logn + 5. 

Proof. Follows directly from the fact that a split costs 4 messages and from 
theorem 2. 

Theorem 5. A range search has in an ADST a worst-case cost of k -\- 1 mes- 
sages, where k is the number of servers covering the range of query, without 
accounting for the single request message and the k response messages. 

Proof. Follows directly from the algorithm. Theorem 3 ensures that the search 
of server covering the leftmost limit of the range adds, without accounting for 
the requests and answer messages, 2 messages at the cost. Then we add other 
k — 1 messages to reach by following the adjacent pointers the remaining k — 1 
servers covering the range. 

As presented in sub-section 2.3, in ADST a merge is followed by the correction 
of local trees and by a restructuring of the tree made up by compound nodes in 
order to keep the invariant 1. In the extended version of this paper we prove the 
following theorem. 

Theorem 6. In ADST a merge costs O(logn) messages, accounting the local 
trees correction messages and the restructuring algorithms. 

The following theorems show the behavior of ADST in the amortized case. 

Theorem 7. A sequence of intermixed exact searches and insertions on ADST 
has an amortized cost o/ 4 -|- messages per operation, where b is the 

capacity of a bucket. 

Proof. Let us consider a sequence of m intermixed exact searches and insertions 
performed on ADST. From lemma 1 we have at most splits, where A = |. 
From theorems 3 and 4 the total number of message for the sequence is C < 
4m -I- (log n -I- 5) <m(^4-\- AIheiDA) ^ ^ hence the result holds. 

Theorem 8. Let b be the capacity of a bucket and ^ be the merge threshold, for 
a fixed constant d > 2. Then a sequence of intermixed exact searches, insertions 
and deletions on ADST has an amortized cost of 4-\- 0 (iogn) j^iggg^j^ggg p^j. 
operation. 
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Proof. (Sketch). Let us consider a sequence of m intermixed exact searches, 
insertions, and deletions performed on ADST. Let D = ^ — ^ = 
ease of presentation, we assume without loss of generality that From 

lemma 1, it is easy to verify that we can have at most operations among 
splits and merges. In the extended version we show that the worst-case for such a 
sequence is when, respect to a server, one split is followed by one merge, and vice- 
versa. From theorems 3, 4 and 6 the total number of message for the sequence 
is C < 4m + \_jf\ 0(log n) <m (^4+ 0 (io^gn) J ^ l^j^g j-gg^it holds. 

We conclude showing a basic fact that holds in a realistic framework. 

Assumption 9. The number n of servers participating in an ADST is such that 
logn < kb, where b is the capacity of a bucket and k > Q is a constant. 

In fact, since in a realistic situation b is at least in the order of hundreds or 
thousands, then, assuming k = 1, it is true in practice that n < 2^. Hence the 
assumption is realistically true. For the rest of the paper we therefore assume 
that: logn < b. 

Under the assumption 9 the results of theorems 7 and 8 show that in practice 
ADST has an amortized constant cost for any single-key operation. 

4 Conclusions 

We presented the ADST (Aggregation in Distributed Search Tree). This is the 
first order preserving SDDS, obtaining a constant single key query cost, like 
LH*, and at the same time an optimal cost for range queries. More precisely 
our structure features: (i) a cost of 4 messages for exact-search queries in the 
worst-case, (ii) a logarithmic cost for insert queries producing a split, (iii) an 
optimal cost for range searches, that is a range search can be answered with 
0{k) messages, where k is the number of servers covering the query range, (iv) 
an amortized almost constant cost for any single-key query. 

The internal load of a server is the typical one of order preserving SDDSs 
proposed till now. In particular, servers managing compound nodes may be also 
the ones managing buckets of the structure, like it is in DRT [8], or they may be 
dedicated servers, like in RP* [10]. The choice does not influence the correctness 
of ADST technique. 

Moreover ADST is also able to manage deletions, and it is easily extendible 
to manage fc-dimensional data, keeping the same results. Furthermore, ADST 
is an orthogonal technique with respect to techniques used to guarantee fault 
tolerance, in particular to the one in [13], that provides a high availability SDDS. 

Hence our proposal is a theoretical achievement potentially attractive for 
distributed applications requiring high performances for single key and range 
queries, high availability and possibly the management of multi-dimensional 
data. 

In [7] experimental comparisons exploring behavior of ADST in the average 
case and comparing it with existing structures are considered. The result is that 
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ADST is the best choice also in the average case. This makes ADST interesting, 
beyond the theoretical level, also from an application point of view. 

Future work will also study the impact of using different aggregation policies. 
Any aggregation policies ensuring results of theorem 2 can be applied. 



References 

1. P. Bozanis, Y. Manolopoulos: DSL: Accomodating Skip Lists in the SDDS Model, 
Workshop on Distributed Data and Structures (WDAS 2000), L’Aquila, June 2000. 

2. Y. Breitbart, R. Vingralek: Addressing and Balancing Issues in Distributed B'^"- 
Trees, 1st Workshop on Distributed Data and Structures (WDAS’98), 1998. 

3. R.Devine: Design and implementation of DDH: a distributed dynamic hashing 
algorithm, Int. Conf. on Foundations of Data Organization and Algorithms 
(FODO), Chicago, 1993. 

4. A.Di Pasquale, E. Nardelli: Fully Dynamic Balanced and Distributed Search 
Trees with Logarithmic Costs, Workshop on Distributed Data and Structures 
(WDAS’99), Princeton, NJ, May 1999. 

5. A.Di Pasquale, E. Nardelli: Distributed searching of fc-dimensional data with al- 
most constant costs, ADBIS 2000, Prague, September 2000. 

6. A.Di Pasquale, E. Nardelli: ADST: Aggregation in Distributed Search Trees, Tech- 
nical Report 1/2001, University of LAguila, February 2001. 

7. A.Di Pasquale, E. Nardelli: A Very Efficient Order Preserving Scalable Distributed 
Data Structure, accepted for pubblication at DEXA 2001 Conference. 

8. B. Kroll, P. Widmayer: Distributing a search tree among a growing number of 
processor, in ACM SIGMOD Int. Conf. on Management of Data, pp 265-276 Min- 
neapolis, MN, 1994. 

9. W. Litwin, M.A. Neimat, D.A. Schneider: LH* - Linear hashing for distributed files, 
ACM SIGMOD Int. Conf. on Management of Data, Washington, D. C., 1993. 

10. W. Litwin, M.A. Neimat, D.A. Schneider: RP* - A family of order-preserving scal- 
able distributed data structure, in 20th Conf. on Very Large Data Bases, Santiago, 
Chile, 1994. 

11. W. Litwin, M.A. Neimat, D.A. Schneider: fc-RPJ - A High Performance Multi- 
Attribute Scalable Distributed Data Structure, in International Conference on 
Parallel and Distributed Information System, December 1996. 

12. W. Litwin, M.A. Neimat, D.A. Schneider: LH* - A Scalable Distributed Data 
Structure, ACM Trans, on Database Systems, 21(4), 1996. 

13. W. Litwin, T.J.E. Schwarz, S.J.: LH*_rs: a High-availability Scalable Distributed 
Data Structure using Reed Solomon Codes, ACM SIGMOD Int. Conf. on Man- 
agement of Data, 1999. 

14. E. Nardelli, F.Barillari, M. Pepe: Distributed Searching of Multi-Dimensional Data: 
a Performance Evaluation Study, Journal of Parallel and Distributed Computation 
(JPDC), 49, 1998. 

15. R.Vingralek, Y. Breitbart, G.Weikum: Distributed file organization with scalable 
cost/performance, ACM SIGMOD Int. Conf. on Management of Data, Minneapo- 
lis, MN, 1994. 

16. R.Vingralek, Y. Breitbart, G.Weikum: SNOWBALL: Scalable Storage on Networks 
of Workstations with Balanced Load, Distr. and Par. Databases, 6, 2, 1998. 




Approximative Learning of Regular Languages 



Henning Fernau 

Wilhelm-Schickard-Institut fiir Informatik 
Universitat Tubingen 
Sand 13, D-72076 Tubingen, Germany 
f ernau® inf ormat ik . uni -tueb ingen . de 



Abstract. We show how appropriately chosen functions / which we call 
distinguishing can be used to make deterministic finite automata back- 
ward deterministic. These ideas have been exploited to design regular lan- 
guage classes called /-distinguishable which are identifiable in the limit 
from positive samples. Special cases of this approach are the fc-reversible 
and terminal distinguishable languages as discussed in [1,3,5,15,16]. Here, 
we give new characterizations of these language classes. Moreover, we 
show that all regular languages can be approximated in the setting in- 
troduced by Kobayashi and Yokomori [12,13]. Finally, we prove that the 
class of all function-distinguishable languages is equal to the class of 
regular languages. 



1 Introduction 

Identification in the limit from positive samples, also known as exact learning 
from text as proposed by Gold [10], is one of the oldest yet most important models 
of grammatical inference. Since not all regular languages can be learned exactly 
from text, the characterization of identifiable subclasses of regular languages is 
a useful line of research, because the regular languages are a very basic language 
family. 

In [6], we introduced the so-called function-distinguishable languages as a rich 
source of examples of identifiable language families. Among the language fami- 
lies which turn out to be special cases of our approach are the fc-reversible lan- 
guages [1] and the terminal-distinguishable languages [15,16], which belong, ac- 
cording to Gregor [11], to the most popular identifiable regular language classes. 
Moreover, we have shown [6] how to transfer the ideas underlying the well- 
known identifiable language classes of fc-testable languages, fc-piecewise testable 
languages and threshold testable languages to our setting. In a nutshell, an 
identification algorithm for /-distinguishable languages assigns to every finite 
set of samples /+ C T* the smallest^ /-distinguishable language containing 
/+ by subsequently merging states which cause conflicts to the definition of 
/-distinguishable automata, starting with the simple prefix tree automaton ac- 
cepting /+. 

^ This is well-defined, since each class of /-distinguishable languages is closed under 
intersection, see Theorem 2. 
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In this paper, we firstly give a further useful characterization of function- 
distinguishable languages which has been also employed in other papers [4,7,8]. 
This also allows us to define a possible merging-state inference strategy in a con- 
cise manner. Then, we focus on questions of approximability of regular languages 
by function-distinguishable languages in the setting introduced by Kobayashi 
and Yokomori [13]. 

The paper is organized as follows: In Section 2, we provide the necessary 
background from formal language theory and in sections 3 and 4, we introduce 
the central concepts of the paper, namely the so-called distinguishing functions 
and the function distinguishable automata and languages. In Section 5, we dis- 
cuss an alternative definition of the function canonical automata which we used 
as compact presentation in other papers. In Section 6, we show how to approx- 
imate arbitrary regular languages by using function-distinguishable languages, 
based on the notion of upper-best approximation in the limit introduced by 
Kobayashi and Yokomori in [12,13]. Section 7 concludes the paper, indicating 
practical applications of our method and extensions to non-regular language 
families. 

An extended version of this paper is available as Technical Report WSI 2001-2. 



2 General Definitions 

E* is the set of words over the alphabet E. E^ (E^^) collects the words whose 
lengths are equal to (less than) k. A denotes the empty word. Pref(T) is the set 
of prefixes of L and u~^L = {v £ E*\uv £ L} is the quotient of L C Y* by u. 

We assume that the reader knows that regular languages can be characterized 
by (deterministic) finite automata A = {Q, T, S, qo,Qp), where Q is the state set, 
SCQxTxQ is the transition relation, qq £ Q is the initial state and Qf ^ Q 
is the set of final states. As usual, <5* denotes the extension of the transition 
relation to arbitrarily long input words. The language defined by an automaton 
A is written L{A). An automaton is called stripped iff all states are accessible 
from the initial state and all states lead to some final state. Observe that the 
transition function of a stripped deterministic finite automaton is not total in 
general. 

We denote the minimal deterministic automaton of the regular language L 
by A{L). Recall that A{L) = (Q,T,S,qo,QF) can be described as follows: Q = 
{u~^L\u £ Pref(L)}, qo = X~^L = L; Qp = {u~^L\u £ L}; and 6(u~^L,a) = 
{ua)~^L with u,ua £ Pref(L), a £ T. According to our definition, any minimal 
deterministic automaton is stripped. 

Furthermore, we need two automata constructions in the following: 

The product automaton A = Ai x A 2 of two automata Aj = {Qi,T,5i, qo^i, Qp^i) 
for f = 1,2 is defined as A = {Q,T, 5, qo, Qf) with Q = Qix Q 2 , qo = (go, 1 , 90 , 2 ), 
Qp = Qp,i X Qf, 2 , {{qi,q 2 ),a, (g'l,?^)) G iff (gi,a,g'i) G <5i and ( 52 , 0 , 52 ) G 52- 

A partition of a set S' is a collection of pairwise disjoint nonempty subsets 
of S whose union is S. If tt is a partition of S, then, for any element s £ S, 
there is a unique element of tt containing s, which we denote and call 
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the Mock of tt containing s. A partition tt is said to refine another partition 
7 t' iff every block of tt' is a union of blocks of tt. If tt is any partition of the 
state set Q of the automaton A = {Q, T, <5, Qo^Qf), then the quotient automaton 
7t“M = {tt~^Q,T, 5' ,B{qQ,Tr),TT~^QF) is given by 7r“^Q = {B{q,Tr) \ q € Q} 
(for Q QQ) and B 2 ) G 5' iff G Bi3q2 G B 2 : ( 51 , 0 , 92 ) G <5. 

3 Distinguishing Functions 

In order to avoid cumbersome case discussions, let us fix now T as the input 
alphabet of the finite automata we are going to discuss. 

Definition 1. Let F he some finite set. A mapping f : T* ^ F is called a dis- 
tinguishing function if f{w) = f{z) implies f{wu) = f{zu) for all u,w,z € T* . 

In the literature, we can find the terminal function [16] 

Ter(x) = { o G T I 3u, v £ T* : uav = x} 

and, more generally, the /c-terminal function [5] 

Terfc(a:) = {TTk{x), pLk{x),ak{x)), where 
Mfc(a^) = { o G I 3m, V £ T* : uav = x} 

and TTk{x) [o-fc(a;)] is the prefix [suffix] of length k of x if x ^ and TTk{x) = 

cTk{x) = X if X £ The example f{x) = ak{x) leads to the fc-reversible 

languages, confer [1,5]. In particular, the trivial distinguishing function, whose 
range is a singleton set, characterizes the 0-reversible languages. Other examples 
of distinguishing functions in the context of even linear languages can be found 
in [4,15]. 

Observe that every regular language R induces, via its Nerode equivalence 
classes, a distinguishing function fa, where fniw) maps w to the equivalence 
class containing w. Especially, T* leads to a trivial distinguishing function /r. : 
T* ^{q}, and the class of /t* - distinguishable languages coincides with the class 
of 0-reversible languages [1] over the alphabet T. 

In some sense, these are the only distinguishing functions, since one can asso- 
ciate to every distinguishing function / a finite automaton Af = {F,T,Sf,f{X),F) 
by setting Sf{q,a) = f{wa), where w G f~^{q) can be chosen arbitrarily, since 
/ is a distinguishing function. 



4 Function Distinguishable Languages 

Here, we will formally introduce function distinguishable languages and discuss 
some formal language properties. 

Definition 2. Let A = (Q, T, d, 50, Qf) be a finite automaton. Let f : T* ^ F 
be a distinguishing function. A is called /-distinguishable if: 



1. A is deterministic. 
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2. For all states q € Q and all x,y € T* with 6*{qo, x) = 6*{qo, y) = q, we have 
fix) = fiy). 

(In other words, for q G Q, f{q) := f(x) for some x with S*iqo,x) = q is 
well-defined.) 

3. For all 91,(72 & Q, qi < 72 , with either (a) ( 71,(72 G Qf or (b) there exist 
qs G Q and aGT with S{qi,a) = S{q 2 ,a) = (73, we have f{qi) /((72)- 

A language is called f -distinguishable iff it can be accepted by an /-distin- 
guishable automaton. The family of /-distinguishable languages is denoted by 
/-DL. 

We need a suitable notion of a canonical automaton in the following. 

Definition 3. Let f : T* ^ F be a distinguishing function and let L CT* be a 
regular set. Let A{L, /) be the stripped subautomaton of the product automaton 
A{L) xAf, i.e., delete all states that are not accessible from the initial state or do 
not lead into a final state of A{L) x Af. A{L, /) is called /-canonical automaton 
of L. 

Observe that the class /-DL formally fixes the alphabet of the languages 
by the range of /. As we have already seen by the examples for distinguishing 
functions listed above, / can oftenly defined for all alphabets. Taking this generic 
point of view, for example, Ter-DL is just the class of (reversals of) terminal dis- 
tinguishable languages [4,16], where the alphabet is left unspecified. 

For example, for each distinguishing function /, the associated automaton 
A/ is /-distinguishable. This simple observation leads us to: 

Theorem 1. A language is function-distinguishable iff it is regular. 

Proof. Let L be a regular language. Consider the canonical automaton A^ for L. 
It is quite easy to see that is /^-distinguishable. □ 

In other words, { /-DL | / is a distinguishing function } gives a finer classi- 
fication of all regular languages. This finer classification is necessary, since it is 
well known that the class of all regular languages is not identifiable in the limit 
from positive data [10]. 

The following theorem generalizes the corresponding assertion for fc-reversible 
languages as stated by Angluin [1]. 

Theorem 2. For each distinguishing function f , f-DL is closed under intersec- 
tion. 

Proof. The standard product automaton construction is applicable. □ 

To the contrary, /-DL is not closed under union nor complement in general, 
see [1]. According to Pin [14], the union closure of the 0-reversible languages 
is characterized by another class of regular languages which he calls reversible. 
He calls a language L reversible iff there is a finite automaton A accepting L 
such that A is deterministic and codeterministic but has possibly several initial 
and several accepting states. Sometimes, such automata are also called injective 
automata or permutation automata. 
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5 An Alternative Presentation 

In [6] , we developed a generic merging state algorithm for /-DL which paralleled 
the approach of Angluin for 0-reversible languages. More precisely, the algorithm, 
when given an input sample I+, starts with the prefix tree acceptor PTA{I^) 
(as defined below). If Af(I^) (L/(/+), resp.) denotes the output automaton 
(output language, resp.) of the merging state inference algorithm when given 
/+, then (disregarding automaton ismorphism) A{Lf{I+), f) = A/(/+), see [6]. 
In their works, Radhakrishnan and Nagaraja [16] do not start with the PTA 
of the given input data set I_|_ but rather with a so-called “skeletal grammar” 
for the given input data set I+, which corresponds to the “maximal canonical 
automaton” MCA{I+) in the framework of Dupont and Miclet [2]. Here, we 
describe a related algorithm for learning /-DL-languages. This way, we also 
yield an alternative characterization of /-DL. 

Consider an input sample set /+ = {wi, . . . , wm} Q ? Let Wi = an . . . ain^, 
where aij € T, 1 < i < M, 1 < j < The skeletal automaton for the sample 
set is defined as 

As(I+) = {Qs, T,Ss,Qo,Qf), where 

Qs = {qij \ ^ < i < M,1 < j <n^ + 1}, 

I f ^ f ^ ^ ^ ^ — j — ; 

Qo = { I 1 < * < } and 

Qf = { qi,rii+i 1 1 < f M }. 

Observe that we allow a set of initial states. The frontier string of is de- 
fined by FS(gij) = Oij . . . Oirn- The head string of Qij is defined by the equation 
HS(gij)FS(gij) = Wi, i.e., HS(gij) = an . . .aij-i. In other words, HS(gij) is the 
unique string leading from an initial state into qij, and FS{qij) is the unique 
string leading from qij into a final state. ^ Therefore, the skeletal automaton of 
a sample set simply spells all words of the sample set in a trivial fashion. Two 
things can be easily observed. 

1. The state partition tt of Qs induced by q = q' iff HS(g) = HS(( 7 ') yields the 
prefix tree acceptor, i.e., PTA{I+) = tt~^As{I+). 

2. Since there is only one word leading to any q, namely HS(g), f{q) = /(HS(g)) 
can be uniquely defined. 

Now, for q^j,qki £ Qs, define qij qki iff (1) HS(g^) = AS{qke) or (2) 
FS{qij) = FS{qki), as well as /(g^) = f{qke)- 
The following assertion is easily verified: 

Lemma 1. For each distinguishing function f and each sample set /+, is 
a reflexive symmetric relation on the set Qs of states of As{I+). 

^ The inclusion of the empty word would introduce some unnecessary technalities. 

® In order to overcome unnecessary technical complications, we underline here that 
we are dealing with a sample set, i.e., we do not consider repitions of sample words 
which are allowed in Gold’s model in general. 
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In general, is not an equivalence relation on the state set of As, as the 
following example shows: 

Example 1 . Consider the trivial distinguishing function (Jq and /+ = {a,aa}. 
The skeletal automaton has the following state transitions: {qu,a,qi2), (<721,0, 
(722) and (922, a, <723). Since HS(<7n) = HS(g2i) = A and HS(gi2) = HS(g22) = a, 
as well as FS(<7i2) = FS(<723) = A, FS(<7n) = FS(<722) = a and FS(<72i) = aa, all 
states in Qs are erg-equivalent, but <7n 912- 

Therefore, we define =/:= (^/)^, denoting in this way the transitive closure 
of the original relation. The following lemmma is again an easy exercise left to 
the reader. 

Lemma 2. For each distinguishing function f and each sample set /+, =f is 
an equivalence relation on the state set of As{I+). 

We consider now the automaton ttJ^As{I+), where tt/ is the partition in- 
duced by the equivalence relation =y. We like to show that Af{I^) = ttJ^As{I+). 
As a preparatory stage, we prove: 

Lemma 3. For each distinguishing function f and each sample set /+, 
ttJ^As{I+) is an f -distinguishable automaton. 

Proof. We have to verify the three conditions posed upon /-distinguishable au- 
tomata for 7 t/^A 5(/_|_). Let <5 denote the transition relation of ttJ^As{I+) and 
qo its initial state. (We use barred state notations for states of ttJ^As{I+) and 
non-barred notations for states of As(/_|_).) 

ad 1.: Consider an input word w with <71,(72 G S*{qo,w). Then, there are some 
qij G qi and qu G q2 (recall that qi,q2 are both sets of states of As{I+)) with 
HS(gij) = w and HS(<7fc£) = w. Hence, qki, which means that qi = q2, 

since q\ and 52 are equivalence classes of states of As{I+). 
ad 2.: Observe that f{q) is well-defined for every state q of As{I+). It is easy to 
check that if g g', then /(g) = /(g'). Since g, g' G g iff g =/ g' iff g q' , 
f{q) = f{q') immediately follows by the transitivity of equality, 
ad 3.: It can be shown similar to point 1 (formally by induction). □ 

Theorem 3. For each distinguishing function f and each sample set /+, we 
have, up to isomorphism, A/(/+) = nJ^As{I+). 

Proof. According to [2], we can consider ttJ^As{I+) as being obtained by a 
sequence of merging state steps, merging only two states at a time. Without loss 
of generality, such a sequence of mergings might start with “repairing” violations 
of the determinism requirement, so that we obtain PTA(I^) as an intermediate 
automaton. Similar to the reasoning in the previous lemma, the reader may 
verify that each of these merging steps can be justified also by the existence of 
conflicts in the merged states according to inference algorithm sketched in the 
introduction. Since we have shown the correctness of that inference algorithm 
in [6], the assertion of this theorem follows, as well. □ 

This argument justifies the presentation of certain subcases of function dis- 
tinguishable languages as done in [4,7]. 
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6 Approximation 

Kobayashi and Yokomori introduced in [12,13] the notion of upper-best ap- 
proximation in the limit of a target language with respect to the hypothesis 
space. They showed that regular languages can be upper-best approximated by 
fc-reversible languages for any fixed k. Here, we shall prove that similar results 
are true for any class /-DL. In particular, this implies that, given any enumera- 
tion of an arbitrary regular language to some identification algorithm for /-DL, 
this algorithm will converge, yielding some well-defined result. Especially, the 
terminal distinguishable languages can be used to approximate all regular lan- 
guages in a precise sense. This is interesting, since already Radhakrishnan and 
Nagaraja observed in [16] on an empirical basis that their algorithm converges 
for regular languages, but not for context-free languages. The approximation 
notion developed by Kobayashi and Yokomori gives a mathematical explanation 
of this empirical observation. 

Firstly, we give the necessary definitions due to Kobayashi and Yokomori. 

Let £ be a language class and L be a language possibly outside C. An upper- 
best approximation CL of L with respect to C is defined to be a language L* 
containing L such that for any L' G C with L C L', L* Q L' holds. If such an 
L* does not exist, CL is undefined. 

Remark 1 . If £ is closed under intersection, then £* is uniquely defined. 

Let £i and £2 be two language classes. We say that £1 has the upper-best 
approximation property (u.b.a.p.) with respect to £2 iff, for every L G £2, £i£ 
is defined. 

Consider an inference machine I to which as input an arbitrary language 
L G C may be enumerated (possibly with repetitions) in an arbitrary order, i.e., 
/ receives an infinite input stream of words £(1), £(2), . . . , where £ : N — >■ £ 
is an enumeration of L. We say that / identifies an upper-best approximation 
of L in the limit (from positive data) by C if I reacts on an enumeration of L 
with an output device stream Di gT> such that there is an N{E) so that, for all 
n > N{E), we have £„ = £at(_e) and, moreover, the language defined by £jv(b) 
equals Cl. a language class £1 is called upper-best approximately identifiable 
in the limit (from positive data) by £2 iff there exists an inference machine / 
which identifies an upper-best approximation of each L G C\ in the limit (from 
positive data) by £2. Observe that this notion of identifiability coincides with 
Gold’s classical notion of learning in the limit in the case when £1 = £2. 

Consider a language class £ and a language L from it. A finite subset E C L 
is called a characteristic sample of L with respect to £ iff, for any L' G C, F C L' 
implies that L C L' . 

Now, fix some distinguishing function /. We call a language L CT* pseudo- 
f -distinguishable iff, for all ui,U2,v G T* with f{ui) = /(U2), we have uf^L = 
uf^L whenever {uiv,U2v} C L. By the characterization theorem derived in [6], 
L G /-DL iff L is pseudo- /-distinguishable and regular. 

Immediately from the definition, we may conclude: 

Proposition 1. Let Li C L2 C . . . be any ascending sequence of pseudo- f- 
distinguishable languages. Then, Ui>l is pseudo- f -distinguishable. □ 
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For brevity, we write ui =lj U2 iff = U2 and f{u\) = f{u2)- 

Remark 2 . If L C T* is a regular language and if / : T* — >■ F is some distinguish- 
ing function, then the number of equivalence classes of =lj equals the number 
of states of Al (plus one) times |F|, and this is just the number of states of 
A{LJ) (plus |F|). 

Let L C T* he some language. For any integer i, we will define Rf{i,L) as 
follows: 

1 . i?/( 0 , L) = L and 

2 . Rf{i,L) = Rf{i— 1 ,L)\J{ U2W \ Uiv, U2V, Uiw G Rf{i— 1 , L)/\f{ui) = /(U2) } 
for i>l. 

Furthermore, set Rf{L) = lJj>o Rf{i,L). 

Observe that, by definition, a language is pseudo-fc-reversible [ 13 ] iff it is 
pseudo-CTfe-distinguishable. Moreover, the operator R/^ introduced in [ 13 ] is writ- 
ten as in our notation. 

Since Rf turns out to be a hull operator, the following statement is obvious. 

Proposition 2 . For any language L and any distinguishing function f, Rf{L) 
is the smallest pseudo- f -distinguishable language containing L. □ 

Lemma 4 . Let L C T* he any language. If u\ and U2 are prefixes of L, then 
u\ =L.f U2 implies that uf^Rf{L) = uf^Rf{L). 

Proof. Let ui and U2 be prefixes of L with ui =lj U2- By definition of =l,/, 
uf^L = uf^L yf 0 . Hence, there is a string v so that {uiv,U2v} Q L C Rf{L). 
Furthermore, by definition of =lj, /(wi) = f{u2). Since Rf{L) is pseudo-/- 
distinguishable due to Proposition 2 , uf^Rf{L) = uf^Rf{L). □ 

Lemma 5 . Let L CT* he any language and let f he any distinguishing function. 
Then, for any prefix wi of Rf{L), there exists a prefix W2 of L with wf^Rf{L) = 
W2 Rf{L). 

Proof. Since wi is a prefix of Rf{L) iff is a prefix of Rf{i, L) for some i > 0 , 
it suffices to show the following claim by induction: 

Let i > 0 . Then, for any prefix of Rf(i,L), there exists a prefix W2 
of L with wf^Rf(L) = wf^Rf(L). 

Trivially, the claim is true when i = 0 , since Rf{ 0 , L) = L. 

As induction hypothesis, assume that the claim is shown for i = £. Hence, we 
have to consider some G Pref{Rf{£ -\- 1 , L)) \ Pief{Rf{£, L)) in the induction 
step. Consider some W\Z G Rf{£ -I- 1 ,L) \ Rj{£,L). This means that there are 
strings u\,v,w G T* with {uiv , U2V , uiw} C Rf(£,L), /(ui) = /(U2) and U2W = 
Wiz.lf\u2\ > [rci j, rci is a prefix of U2W G Rf{£, L) in contrast to our assumption. 
Therefore, we have wi = U2v' for some v' G . Since Rf{L) is pseudeo-/- 
distinguishable and {uiv,U2v} C Rf{L) as well as /(ui) = /(M2), uf^RfiL) = 
uf^Rf{L), which yields wf^Rf{L) = {u2v')~^ Rf{L) = {uiv')~^Rf{L). Since v' 
is a prefix of w, uiv' is a prefix of u\w G Rf{£, L). By induction hypothesis, there 
is a prefix W2 of L such that wf^Rf{L) = {uiv')~^Rf{L) = wf^Rf{L). □ 
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By a reasoning completely analogous to [13], we may conclude: 

Theorem 4. For any distinguishing function f, the class f-DL has the u.b.a.p. 
with respect to the class of regular languages. □ 

Observe that the number of states of is closely related to the number 

of states of A{L,f), see Remark 2. 

Theorem 5. For any distinguishing function f , the class of regular languages is 
upper-best approximately identifiable in the limit from positive data by f-DL. □ 

7 Discussion 

We have proposed a large collection of families of languages, each of which is 
identifiable in the limit from positive samples, hence extending previous works. 
We feel that deterministic methods yielding characterizable regular subclasses 
(such as the ones proposed in this paper) are quite important for practical ap- 
plications, since they could be understood more precisely than mere heuristics, 
so that one can prove certain properties about the algorithms. Moreover, the 
approach of this paper allows one to make the bias (which each regular language 
identification algorithm necessarily has) explicit and transparent to the user: 
The bias consists in (1) the restriction to regular languages and (2) the choice of 
a particular distinguishing function /. Detailed comments in this direction can 
be found in [8]. 

We will provide a publicly accessible prototype learning algorithm for (each 
of the families) f-DL in the near future. A user can then firstly look for an 
appropriate / by making learning experiments with typical languages he expects 
to be representative for the languages in his particular application. If there are 
only few “typical languages” Li,. . . , Lr in the beginning, one could also start 
with /ij X • • • X where f x g is defined as (/ x g){x) = {f{x),g{x)), see the 
proof of Theorem 1 . After this “bias training phase” , the user may then use the 
such-chosen learning algorithm (or better, an improved implementation for the 
specific choice of /) for his actual application. 

Even if the particular class f-DL chosen by the user does not completely 
comprise all languages the identification machine IM will be confronted with, 
Theorem 5 suggests that, in the case that a regular language which does not lie 
in f-DL is enumerated to IM, some reasonable outcome will be produced in a 
reasonable time. 

If the application suggests that the languages which are to be inferred are 
non-regular, methods such as those suggested in [15] can be transferred. This is 
done most easily by using the concept of control languages as undertaken in [3,4] 
or [17, Section 4] or by using the related concept of permutations, see [9]. 



Acknowledgment 

We gratefully acknowledge discussions with S. Kobayashi. 




232 Henning Fernau 



References 

1. D. Angluin. Inference of reversible languages. Journal of the Association for 
Computing Machinery, 29(3):741-765, 1982. 

2. P. Dupont and L. Miclet. Inference grammaticale reguliere: fondements theoriques 
et principaux algorithmes. Technical Report RR-3449, INRIA, 1998. 

3. H. Fernau. Learning of terminal distinguishable languages. Technical Report 
WSI-99-23, Universitat Tubingen (Germany), Wilhelm-Schickard-Institut fiir In- 
formatik, 1999. Short version published in the proceedings of AMAI 2000, see 
http: //rutcor .rutgers . edu/~amai/AcceptedCont .htm. 

4. H. Fernau. Identifying terminal distinguishable languages. Submitted revised 
version of [3]. 

5. H. Fernau. fc-gram extensions of terminal distinguishable languages. In Proc. 15th 
International Conference on Pattern Recognition. 2nd Volume, pp. 125-128, IEEE 
Press, 2000. 

6. H. Fernau. Identification of function distinguishable languages. In Proc. 11th 
International Conference Algorithmic Learning Theory (ALT), volume 1968 of 
LNCS/LNAI, pages 116-130. Springer, 2000. 

7. H. Fernau. Parallel communicating grammar systems with terminal transmission. 
Acta Informatica, 37:511-540, 2001. 

8. H. Fernau. Learning XML Grammars. In Proc. 2nd Machine Learning and Data 
Mining in Pattern Recognition MLDM’Ol, volume 2123 of LNCS/LNAI, pages 73- 
87. Springer, 2001. 

9. H. Fernau and J. M. Sempere. Permutations and control sets for learning non- 
regular language families. In Proc. 5th International Colloquium on Grammatical 
Inference (ICGI): Algorithms and Applications, volume 1891 of LNCS/LNAI, pages 
75-88. Springer, 2000. 

10. E. M. Gold. Language identification in the limit. Information and Control (now 
Information and Computation), 10:447-474, 1967. 

11. J. Gregor. Data-driven inductive inference of finite-state automata. International 
Journal of Pattern Recognition and Artificial Intelligence, 8(l):305-322, 1994. 

12. S. Kobayashi and T. Yokomori. On approximately identifying concept classes in the 
limit. In Proc. 6th International Conference Algorithmic Learning Theory (ALT), 
volume 997 of LNCS/LNAI, pages 298-312. Springer, 1995. 

13. S. Kobayashi and T. Yokomori. Learning approximately regular languages with 
reversible languages. Theoretical Computer Science, 174:251-257, 1997. 

14. J.E. Pin. On the languages accepted by finite reversible automata. In Ifth 
ICALP’87, volume 267 of LNCS, pages 237-249, 1987. 

15. V. Radhakrishnan. Grammatical Inference from Positive Data: An Effective Inte- 
grated Approach. PhD thesis. Department of Computer Science and Engineering, 
Indian Institute of Technology, Bombay (India), 1987. 

16. V. Radhakrishnan and G. Nagaraja. Inference of regular grammars via skeletons. 
IEEE Transactions on Systems, Man and Cybernetics, 17(6):982-992, 1987. 

17. Y. Takada. A hierarchy of language families learnable by regular language learning. 
Information and Computation, 123:138-145, 1995. 




Quantum Finite State Transducers 



Rusins Freivalds^ and Andreas Winter^ 

^ Institute of Mathematics and Computer Science, University of Latvia, 
Raina bulvaris 29, LV-1459, Riga, Latvia 
Rusins .FreivaldsSmii .lu.lv 
^ Department of Computer Science, University of Bristol, 

Merchant Venturers Building, Woodland Road, Bristol BS8 lUB, United Kingdom 

winterOcs .bris .ac.uk 



Abstract. We introduce quantum finite state transducers (qfst), and 
study the class of relations which they compute. It turns out that they 
share many features with probabilistic finite state transducers, especially 
regarding undecidability of emptiness (at least for low probability of suc- 
cess). However, like their ‘little brothers’, the quantum hnite automata, 
the power of qfst is incomparable to that of their probabilistic counter- 
part. This we show by discussing a number of characteristic examples. 



1 Introduction and Definitions 

The issue of this work is to introduce and to study the computational model of 
quantum finite state transducers. These can be understood as finite automata 
with the addition of an output tape which compute a relation between strings, 
instead of a decision (which we read as a binary valued function). After the 
necessary definitions, the relation to quantum finite automata is clarified (sec- 
tion 2), then decidability questions are addressed (section 3): it is shown that 
emptiness of the computed relation is undecidable both for quantum and prob- 
abilistic transducers. However, the membership problem for a specific output is 
decidable. Next, the relation between deterministic and probabilistic transducers 
is explored (section 4), and in section 5 quantum and probabilistic transducers 
are compared. 

We feel our extension of quantum automata studies to this new model justi- 
fied by the following quote from D. Scott [10]: 

‘The author (along with many other people) has come recently to the con- 
clusion that the functions computed by the various machines are more 
important - or at least more basic - than the sets accepted by these de- 
vices. (...) In fact by putting the functions first, the relationship between 
various classes of sets becomes much clearer’. 

We start be reviewing the concept of probabilistic finite state transducer. For 
a finite set X we denote by X* the set of all finite strings formed from X, the 
empty string is denoted e. 
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Definition 1 A probabilistic finite state transducer (pfst) is a tuple 
T = {Q, Si,S2, V, /, qo, Qacc, Qrej), 

where Q is a finite set of states, Si, S 2 is the input/ output alphabet, qg € Q is 
the initial state, and Qacc, Qrej C Q are (disjoint) sets of accepting and rejecting 
states, respectively. (The other states, forming set Qnon, are called non-halting) . 
The transition function V : Si x Q ^ Q is such that for all a € Si the matrix 
{y/qp is stochastic, and fa-Q^S^is the output function. If all matrix entries 
are either 0 or 1 the machine is called a deterministic finite state transducer 
(dfst) . 

The meaning of this definition is that, being in state q, and reading input symbol 
a, the transducer prints fa(q) on the output tape, and changes to state p with 
probability {Va)qp, moving input and output head to the right. After each such 
step, if the machine is found in a halting state, the computation stops, accepting 
or rejecting the input, respectively. 

To capture this formally, we introduce the total state of the machine, which 
is an element 

(PNON,i^ACC,Prej) G ^'(Q X A*) © (A*) © ({REJ}) , 

with the natural norm 

||(^NON,-PACC,Prej)|| = ||-PnOn||i + H^’aCcIIi + breji- 

At the beginning, the total state is ((go, e), 0, 0) (where we identify an element 
of Q X S 2 with its characteristic function). The computation is represented by 
the (linear extensions of the) transformations 



Ta : {{q,w),PACC,Pre}) ^ 



{Va)qpP,wfa{q) 



{peQn 




of the total state, for a G Si, with 






Pacc{x) + ^pi^Q^^/y/qp 
Pacc{x) 



if a; = wfa{q), 
else. 



and p,.gj — Prej + X^psQrej (^o,)qp- 

For a string xi . . .Xn the map T^ is just the concatenation of the T^^. Observe 
that all the Ta conserve the probability. 

Implicitely, we add initial and end marker symbols (|, $) at the input, with 
additional stochastic matrices V) and V$, executed only at the very beginning, 
and at the very end. We assume that V$ puts no probability outside Qacc UQrej. 

By virtue of the computation, to each input string v G S( there corresponds 
a probability distribution Tf\v) on the set AJ U {REJ}: 



T(REJb) :=Tt„s((go,e),0,0)[REJ] 
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is the probability to reject the input v, whereas 

T{w\v) := Tj„$((9o,e),0,0)[w] 

is the probability to accept, after having produced the output w. 

Definition 2 Let TZ G x 

For a > 1/2 we say that T computes the relation?^ with probabilitya if for 
all V, whenever (v,w) € TZ, then T{w\v) > a, and whenever {v,w) ^ TZ, then 
T{w\v) < 1 — a 

For 0 < a < 1 we say that T computes the relation??, with isolated cutpointa 
if there exists e > 0 such that for all v, whenever (v, w) G TZ, then T{w\v) > a+e, 
but whenever {v,w) ^ TZ, then T{w\v) < a — e. 

The following definition is modelled after the ones for pfst for quantum finite 
state automata [8]: 

Definition 3 A quantum finite state transducer (qfst) is a tuple T = {Q, Si, 
S 2 , V, /) <Zoj Qacc: Qrej); whcrc Q is a finite set of states. Si, S 2 is the input/out- 
put alphabet, qo € Q is the initial state, and QaccQrej C Q are (disjoint) sets of 
accepting and rejecting states, respectively. The transition function V : SixQ ^ 
Q is such that for all a € Si the matrix {Va)qp is unitary, and fa'Q^S^ is 
the output function. 

Like before, implicitely matrices Vj and V$ are assumed, V$ carrying no ampli- 
tude from Qnon to outside Qacc U Qrej- The computation proceeds as follows: 
being in state q, and reading a, the machine prints fa{q) on the output tape, 
and moves to the superposition Va\q) = ^p{Va)qp\p) of internal states. Then 
a measurement of the orthogonal decomposition E^on © £-acc © L'rej (with the 
subspaces Ei = span Qi C which we identify with their respective projec- 

tions) is performed, stopping the computation with accepting the input on the 
second outcome (while observing the output), with rejecting it on the third. 
Here, too, we define total states: these are elements 

(IV’NOn), T’ACCjPrej) G X Slfi) © £^(£’ 2 ) © £^({REJ}), 



with norm 



||(|V’NON),T’ACC,Trej)|| = || IV'NOn) II 2 + ||RaCc||i + breji- 

At the beginning the total state is (|(7o)©|e)) Oj 0)> the total state transformations, 
for 

lb) = b?). with \ujq) = ^ aq^\w), 

q^Q 

are (for a G Si) 

Ta ■ (|'0),T’ACC,Prej) ^ Enon E Va\q) © \‘^qfa{q)),P'ACC^Prei^ > 
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where |w,/a(g)) = aqu;|w/a(g)), and 



Pacc(^) — Pacc{x) + 



E,, 



E 



*-qw 



Va\q) 



q,w s.t. x=wfa(q) 



Prej ~ Prej + 



^rej E Va\q) ® \ljJqfa{q)) 



2 



2 



Observe that the Ta do not exactly preserve the norm, but that there is a constant 
7 such that ||To(Ar)|| < 7 ||Ar|| for any total state X. Quite straightforwardly, the 
distributions T{-\v) are defined, and so are the concepts of computation with 
probability a or with isolated cutpoint a. 

Observe also that we defined our model in closest possible analogy to quan- 
tum finite automata [8]. This is of course to be able to compare qfst to the 
latter. In principle however other definitions are conceivable, e.g. a mixed state 
computation where the Ta are any completely positive, trace preserving, linear 
maps (the same of course applies to quantum finite automata!). We defer the 
study of such a model to another occasion. 

Notice the physical benefits of having the output tape: whereas for finite 
automata a superposition of states means that the amplitudes of the various 
transitions are to be added, this is no longer true for transducers if we face a 
superposition of states with different output tape content. I.e. the entanglement 
of the internal state with the output may prohibit certain interferences. This 
will be a crucial feature in some of our later constructions. 



2 Quantum Finite Automata and Quantum Transducers 

The definition of qfst is taylored in such a way that by excluding the output tape 
and the output function, we get a quantum finite automaton. One, however, with 
distinct acceptance and rejection properties, as compared to the qfst. Neverthe- 
less, the decision capabilities of qfst equal those of quantum finite automata: 



Theorem 4 A language L is accepted by a 1-way quantum finite automaton 
with probability bounded away from 1/2 if and only if the relation Lx {0}ULx {1} 
is computed with isolated cutpoint. 

Proof: First observe that for finite automata (probabilistic and quantum), recog- 
nizability with an isolated cutpoint is equivalent to recognizability with proba- 
bility bounded away from 1/2 (by “shifting the cutpoint”: just add in the |-step 
possibilities to accept or reject right away with certain probabilities). We have 
to exhibit two constructions: 

Let there be given a quantum finite automaton. We may assume that it is 
such that V$ is a permutation on Q. 

This can be forced by duplicating each q G Qacc UQrej by a new state q', and 
modifying the transition function as follows: denote by a the map interchanging 
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q with q' for q ^ Qnon, and being the identity on q,q' for q G Qnon- Define a 
unitary U such that for q G Qnon 

U\q) = ^(t^$),p|crp), 

P 



and C/|(7) = |g) for g G Qacc U Qrej- Now let 

V{ :=UVi, Vi := a, := UVaU~\ 

It is easily checked that this automaton behaves exactly like the initial one. 

Construct a qfst as follows: its states are QUQ, with Q = {q : q £ QaccUQrej} 
being the accepting states, and no rejecting states. Let the transition function 
be W with 



Wa\q) = Va\q) for q G Qnon, but 

Wa\q) = \q) for g G <5 acc C rej • 



Since V$ is the permutation a on Q, we may define 
W$\q) = 



og) for erg G Qacc U Qrej, 
aq) for erg G Qnon- 



Finally, let the output function be (for q £ Q) 



fa{q) = 



0 for g G 

1 forgGQrej, 



f${q) = 



0 for erg G Qacc, 

1 for erg G Qrejj 



and e in all other cases. It can be checked that it behaves in the desired way. 

Given a qfst, construct a quantum finite automaton as follows: its states 
are Q x Fly*, where the second component represents the tape content up to 
t = 1 + ma,Xa,q 1 / 0 ( 9 )! many symbols. Initial state is (go,e). Observe that by 
definition of the Ta amplitude that once is shifted onto output tapes of length 
larger than 1 is never recovered for smaller lengths. Hence we may as well cut 
such branches by immediate rejection: the states in Q x are all rejecting, 
and so are (Qacc 0 Qrej) x {!}• The accepting states are Qacc x {0}. 

The transition function is partially defined by 



Wa\q,x) := '^{Va)qp\p,xfa{q)), X £ SU{e}, 
peQ 



(for a = $ this is followed by mapping |p, e) to a rejecting state, while leaving 
the other halting states alone), i.e. the automaton performs like the qfst on the 
elements of Q, and uses the second component to simulate the output tape. We 
think of Wa being extended in an arbitary way to a unitary map. One can check 
that this construction behaves in the desired way. □ 
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3 Decidability Questions 

As is well known, the emptiness problem for the language accepted by a deter- 
ministic (or nondeterministic) finite automaton is decidable. Since the languages 
accepted by probabilistic and quantum finite automata with bounded error are 
regular [9,8], these problems are decidable, too. 

For finite state transducers the situation is more complicated: In [6] it is 
shown that the emptiness problem for deterministic and nondeterministic fst is 
decidable. In contrast we have 

Theorem 5 The emptiness problem for pfst computing a relation with probabil- 
ity 2/3 is undecidable. 

Likewise, the emptiness problem for qfst computing a relation with probability 
2/3 zs undecidable. 

Proof: By reduction to the Post Correspondence Problem [6] : let an instance 
(ui, . . . , Vk), (wi, . . . , Wk) of PCP be given (i.e. Vi, Wi G 27+). It is to be decided 
whether there exists a sequence zi,...,z„ (n>0) such that 

Vii - ■■ =Wi^--- Wi„. 

Construct the following qfst with input alphabet {!,..., A}. It has the who- 
ever states qo,qv,qw, and ^rej- The initial transformation produces a superposi- 
tion of (?« , j <Zrej ) each with amplitude l/-\/3- The unitaries Ui are all identity, 

but the output function is defined as fi{qx) = Xi, for x € {v,w}. The end- 
marker maps qv,q-w to accepting states. It is clear that ii,...,z„ is a solution 
iff (zi . . . in, wq • • • is in the relation computed with probability 2/3 (the au- 
tomaton is easily modified so that it rejects when the input was the empty word, 
in this way we force n > 0). 

By replacing the unitaries by stochastic matrices (with entries the squared 
moduli of the corresponding amplitudes) the same applies to pfst. 

Since it is well known that PCP is undecidable, it follows that there can be 
no decision procedure for emptiness of the relation computed by the constructed 
pfst, or qfst, respectively. □ 

Remark 6 Undecidable questions for quantum finite automata were noted first 
for “1^-way” automata, i.e. ones which move only to the right on their input, 
but may also keep their position on the tape. In [1] it is shown that the equiva- 
lence problem for these is undecidable. The same was proved for l-way-2-tape 
quantum finite automata in [3]. 



Conjecture 7 The emptiness problem for probabilistic and quantum fst com- 
puting a relation with probability 0.99 is decidable. 

The emptiness problem for probabilistic and quantum fst computing a relation 
with a single-letter input alphabet, with probability 1/2 -|- e is decidable. 
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To prove this, we would like to apply a packing argument in the space of all 
total states, equipped with the above metric. However, this fails because of the 
infinite volume of this space (for finite automata it is finite, see [9] and [8]). In 
any case, a proof must involve the size of the gap between the upper and the 
lower probability point, as the above theorem shows that it cannot possibly work 
with gap 1/3. 

Still, we can prove: 

Theorem 8 If the relation TZ is computed by a pfst or a qfst with an isolated 
outpoint, then Range(7?.) = {y \ 3x (x,y) G TZ} is a recursive set (so, for each 
specific output, it is decidable if it is ever produced above the threshold probabil- 
ity). 

We regret to omit the proof because of the page limit. The proof can be 
found in [5]. □ 

4 Deterministic vs. Probabilistic Transducers 

Unlike the situation for finite automata, pfst are strictly more powerful than 
their deterministic counterparts: 

Theorem 9 For arbitrary e > 0 the relation 

TZi = {(0™1™,2™) : m> 0} 

can be computed by a pfst with probability 1 — £. It cannot be computed by a dfst. 

Proof: The idea is essentially from [4] : for a natural number k choose initially an 
alternative j G {0, . . . ,k — 1}, uniformly. Then do the following: repeatedly read 
k O’s, and output j 2’s, until the I’s start (remember the remainder modulo k), 
then repeatedly read k I’s, and output k — j 2’s. Compare the remainder modulo 
k with what you remembered: if the two are equal, output this number of 2’s 
and accept, otherwise reject. 

It is immediate that on input O'"!"* this machine outputs 2"^ with certainty. 
However, on input each 2" receives probability at most 1/k. 

That this cannot be done deterministically is straightforward: assume that a 
dfst has produced /(m) 2’s after having read m O’s. Because of finiteness there 
are k, I such that after reading k I’s (while no 2’s were output) the internal state 
is the same as after reading I further I’s (while n 2’s are output). So, the output 
for input 0™!^+’’* is and these pairs are either all accepted or all 

rejected. Hence they are all rejected, contradicting acceptance for m = k-\-rl. □ 

By observing that the random choice at the beginning can be mimicked 
quantumly, and that all intermediate computations are in fact reversible, we 
immediately get 

Theorem 10 For arbitrary £ > 0 the relation TZi can be computed by a qfst 
with probability 1 — £. □ 

Note that this puts qfst in contrast to quantum finite automata: in [2] it was 
shown that if a language is recognized with probability strictly exceeding 7/9 
then it is possible to accept it with probability 1, i.e. reversibly deterministically. 
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5 ... vs. Quantum Transducers 

After seeing a few examples one might wonder if everything that can be done 
by a qfst can be done by a pfst. That this is not so is shown as follows: 

Theorem 11 The relation 

7^3 = {(0’”l”2^ 3™) : n yf fc A (m = fc V m = n)} 

can he computed by a qfst with probability 4/7 — s, for arbitrary e > 0. 

Theorem 12 The relation TZ^ cannot he computed by a pfst with probability 
bounded away from 1/2. In fact, not even with an isolated outpoint. 

Proof ( of theorem 11): For a natural number I construct the following trans- 
ducer: from go go to one of the states gi, qj^b (j G {0, — 1}, 6 G {1, 2}), with 
amplitude a / 3/7 for gi and with amplitude y^2f{7l) each, for the others. Then 
proceed as follows (we assume the form of the input to be 0'"1"2*, others are 
rejected): for gi output one 3 for each 0, and finally accept. For qj^b repeatedly 
read I O’s and output j 3’s (remember the remainder m mod /). Then repeat- 
edly read I b's and output I — j 3’s (output nothing on the (3 — b)’s). Compare 
the remainder with the one remembered, and reject if they are unequal, other- 
wise output this number of 3’s. Reading $ perform the following unitary on the 
subspace spanned by the qj^b and duplicate states qj',b- 

Accepting are all gy', 2 , rejecting are all qj',i- 

It can be checked that the automaton behaves in the desired way. □ 

Proof (of theorem 12): By contradiction. Suppose 77-3 is computed by a pfst T 
with isolated cutpoint a. It is easily seen that the cutpoint can be shifted 1/2. 

Hence, we may assume that T computes 7Z with probability (p > 1/2, from 
this we shall derive a contradiction. The state set Q together with any of the 
stochastic matrices Vo,Vi,V 2 is a Markov chain. We shall use the classification 
of states for finite Markov chains (see [7]): for Vi Q is partitioned into the set Ri 
of transient states (i.e. the probability to find the process in Rt tends to 0) and 
a number of sets of ergodic states (i.e. once in Sij the process does not leave 
this set, and all states inside can be reached from each other, though maybe 
only by a number of steps). Each Sij is divided further into its cyclic classes 
Ciju {v G Zdy), Vi mapping Ciji, into Cij^+i. By considering sufficiently high 
powers Vf (e.g. product of all the periods dij) as transition matrices, all these 
cyclic sets become ergodic, in fact, Vf restricted to each is regular. 

Using only these powers amounts to concentrating on input of the form 
0™1”2^, with i = i'^, which we will do from now on. Relabelling, the ergodic 
sets of Vi = Vf will be denoted Sij. Each has its unique equilibrium distribu- 
tion, to which every initial one converges: denote it by Tr^. Furthermore, there 
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are limit probabilities a(jo) to find the process Vq in Soj„ after long time, start- 
ing from qq. Likewise, there are limit probabilities b{ji\jo) to find the process 
Vi in Sij-^ after long time, starting from TTQjg, and similarly c(j 2 |ji)- So, by 
the law of large numbers, for large enough m, n, k the probability that Vq has 
passed into Sojg after ^/rn steps, after which Vi has passed into after ^/n 
steps, after which V 2 has passed into S 2 j 2 after y/k steps, is arbitrarily close to 
P{jo,ji,j 2 ) = a{jo)b{ji\jo)c{j 2 \ji)- (Note that these probabilities sum to one). 

Applying the ergodic theorem (or law of large numbers) and standard tech- 
niques (see [7]) one arrives at the conclusion that, if all inputs with 

k ^ m, accept with output 3'^'" above the outpoint, so must contra- 
dicting the fact that (0'^™, 3^^™) is not in the relation. 

□ 

In general however, computing with isolated outpoint is strictly weaker than 
with probability bounded away from 1 /2 (observe that for finite automata, prob- 
abilistic and quantum, recognizability with an isolated outpoint is equivalent to 
recognizability with probability bounded away from 1/2, see theorem 4): 

Theorem 13 The relation 

7^4 = {(0™l"a,4') : (a = 2 ^ = to) A (a = 3 ^ = n)} 

can he computed by a pfst and by a qfst with an isolated outpoint (in fact, one 
arbitrarily close to 1/2/, but not with a probability hounded away from 1/2. 

Proof: First the construction (again, only for qfst): initially branch into two 
possibilities Cq,Ci, each with amplitude l/-\/2- Assume that the input is of the 
correct form (otherwise reject), and in state Ci output one 4 for each i, ignoring 
the (1 — f)’s. Then, if a = 2 -|- i, accept, if a = 3 — f, reject. It is easily seen that 
4^ is accepted with probability 1/2 if (0™l”a, 4*) G 77-4, and with probability 0 
otherwise. 

That this cannot be done with probability above 1/2 is clear intuitively: 
the machine has to produce some output (because of memory limitations), but 
whether to output 4™ or 4" it cannot decide until seeing the last symbol. For- 
mally, assume that |to — n| > 4t, with t = maxa,g \fa{q)\ - If 

rto'"i"2$((<Zo,e),0,0)[4™] = T(4™|0™1"2) > 1/2 + S, 

necessarily 

Tjomi"((9o, e)) Oj 0)[4’”] -I- Tjo™!" ((^O) e)> 0; 0)[Qnon x ^ 1/2 + 5. 

But this implies 

Tj0”‘l"((90) e); 0; 0)[4"] + T|:0™1"((<Z0) e)) 0: 0)[Qnon X 4 N 2i,n+2i]j ^ 1/2 — 5, 
hence 

rto'"i»3$((9o,e),O,0)[4"] = T(4"|0'"1"3) < 1/2 - <5, 

contradicting (0"*1”3,4”) G TZ^. □ 

To conclude from these examples, however, that quantum is even better than 
probabilistic, would be premature: 
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Theorem 14 The relation 

7^5 = {{wx, x) : w € {0, 1}*, a; € {0, 1}} 

cannot he computed by a qfst with an isolated outpoint. ( Obviously it is computed 
by a pfst with probability 1, i.e. a dfst). 

Proof: The construction of a dfst computing the relation is straightforward. To 
show that no qfst doing this job exists, we recall from [8] that {0, 1}*0 is not 
recognized by a 1-way quantum finite automaton with probability bounded away 
from 1/2, and use theorem 4 for this language. □ 
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Abstract. Stemming is a widely accepted practice in Document In- 
formation Retrieval Systems (DIRs), because it is more benefical than 
harmful [3] as well as having the virtue of improving retrieval efficiency 
by reducing the size of the term index. We will present a technique of 
semi-automatic stemming that is fine designed for JAVA environment. 
The method works without deep knowledge of grammar rules of a lan- 
guage in contradistinction to well-known Porter’s algorithm [8]. From 
that point of view, we can call our method universal for more languages. 
We will also present tests to show quality of the method and its error- 
rate. 



1 Introduction 

Most document information retrieval systems operate with free text documents, 
where we cannot guarantee terms in their base forms. That situation can be 
difficult, because an user may put his query in any form and we would solve that 
query against huge set of documents that use any valid variant of the terms of 
that query. DIR system has to recognize these variants somehow. A solution, we 
can choose, is rather simple. The problem can be overcome with the substitution 
of the words by their respective base forms (so called “stems”). The process of 
transformation is called “stemming” and the modul of DIR that provides that 
functionality is called “lemmatizer”. In this paper we will suppose, that the 
lemmatizer does not offer anything else than the stemming function. In common 
case it is not true, because the lemmatizer can provide a rich text analysis (i.e. 
recognition of nouns, verbs etc.) that can improve quality of DIR system. 

Stemming is benefical, because it reduces the number of terms (and also size 
of data streams) in a system. Our goal is not to develop a perfect, error free 
stemming method. We need to provide a method that transforms a huge set of 
words to a small set of stems (corpus of a system) without a noise. The noise 
can be defined as a situation, when a word is transformed to a stem of another 
word. If we allow that noise, we lost meaning of a document. 

* This work was supported by the GACR grant N° 201/00/1031. 
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There are four main types of stemming strategies: table lookup (equal to our 
basic reduction method), successor variety, n-grams and affix removal (as good 
as our best reduction methods). Table lookup consists simply of looking for the 
stem of a word in a table. Our basic reduction method does the same thing 
with a small modification - it looks for the edit command, that transforms a 
word to its stem. But both methods are dependent on table (pairs word-stem) 
for the whole language and it might not be available. Successor variety is more 
complex and is based on the determination of morpheme boundaries (see more 
in structural linquistics) . N-grams strategy is based on identification of n-grams. 
Affix removal stemming is more simple, and can be implemented efficiently if 
we know the rules for affix removal in language. Our method tries to solve the 
problem, when we cannot supply the rules. 

The tacit assumption in our paper is that we must get a sample set (so-called 
dictionary) of words and their stems. The dictionary cannot be generated by 
an automat, so we called our method “semi-automatic” . We will show how we 
can define a set of transformation commands (P-commands) and how we can 
store the commands with original words. If the structure exists, we can create 
a stem of a word, because we have got a word and a transformation command 
that builds the stem. We will present the solution of two problems. How can we 
reduce the original huge data structure? And how can we transform words that 
we did not get with the original dictionary. 

The method would be usable in JAVA environment. What does it mean for 
implementation? First, the data structure of the method would be small and 
static. We would like to achieve a small size of the structure - i.e. under 64kB. 
Second, the data structure must allow a nice support of threads for a better 
scalability. Third, processing of a document of length I would be done with time 
complexity 0(1). Fourth, we must achieve a very small overhead on processing 
time. 



2 Stemming 

We use the definition of stemming, that has been presented by Boot [1]: “Stem- 
ming is the transformation of all inflected word forms contained in a text to 
their dictionary look-up form” . The dictionary look-up form is called a lemma. 
A stem is the portion of a word which is left after removal of its prefixes and 
suffixes. In our case, we do not need to recognize stem and lemma so well. If 
there are more dictionary look-up forms, we will choose just one of them and we 
will call it - a stem. 

Let’s define a function stem that transforms a word to its stem. The function 
(stem) mostly tranforms a word to its stem after removal of prefixes and suf- 
fixes. Sometimes the function must also insert dead characters (i.e. ’e’ in English 
etc.). Our tests showed that we would also accept a ’replace’ action in the stem 
function. That action concentrates removal and insertion in some cases. The 
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patch command (P-command) represents a functionality of the stem function. 
We must ensure that P-commands are universal, and P-commands can transform 
any word to its stem. 

Our solution is based on Levenstein metric [6]^ that produces P-command as 
the minimum cost path in a directed graph (see below). The metric computes 
the distance between two strings (in our case between a word and its stem) as 
measured by the minimum cost sequence of edit operations. These basic op- 
erations will be called partial patch commands (PP-commands) in this paper. 
PP-command is responsible for an atomic transformation of a string. In previ- 
ous text we explained that we will need three basic operations: removal, that 
deletes a sequence of characters; insertion, that inserts a character; substitution, 
that rewrites a character; and finaly we will also need a NOP (no-operation) 
PP-command that skips a sequence of characters. 

One can imagine the P-command as an algorithm for an operator (editor) that 
rewrites a string to another string. We will slightly redefine the three basic op- 
erations, because we must also define the cursor movement with more precision: 
removal - deletes a sequence of characters starting at the current cursor position 
and moves the cursor to the next character (the length of that sequence is the 
parameter of this PP-command); insertion - inserts a character c/i, cursor does 
not move (the character ch is a parameter); substitution - rewrites a character 
at the current cursor position to the character ch and moves the cursor to the 
next character (the character ch is a parameter). 

Note 1. We need not store the last NOP command, because it does not change 
the string: “remove 3 characters, skip (NOP) 6 characters” is equal to “remove 
3 characters” . On the other hand, we cannot omit a significant NOP command: 
“remove three charaters, skip one character, insert ’abc’ string, skip four charac- 
ters”. That P-command would be optimalized as “remove three charaters, skip 
one character, insert ’abc’ string” . 

We will apply the P-commands backward (right to left), because there are more 
suffixes than prefixes. It can reduce the set of P-commands, because the last NOP 
is not stored. That NOP only moves the cursor to the end of a string without any 
changes, stem(eats) =eat, stem(provides) =provide; if we apply the P-command 
left to right, we would need two commands “skip three characters, remove one 
character” and “skip seven characters, remove one character”; if we apply the 
P-command right to left, we need only single command “remove one character”. 



PP-commands will be written as a pair of two characters. The first character will 
represent an operation (or instruction) and the second one will hold a parameter. 
We use these marks for our basic operations: ’D’ - removal, ’I’ - insertion, ’R’ - 

^ Levenstein metric was chosen, because it will allow us the same modifications of 
a term as Porter’s algorithm or any affix removal algorithm. The metric is also 
appropriate for English and Czech (after conversion to us-ascii) languages. 
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substitution, - NOP. The second character is a parameter of PP-command. 
Sometimes we have to store a number^ in the parameter (i.e. removal and NOP 
PP-commands) . The encoding function we chose is defined as: ’a’ = ’b’=’2’, 

’c’=’3’ etc. We have learned, that an original string o can be transformed to a 
string s with instructions of a P-command p. Let’s define function patch that 
covers that process: patch{o,p) = s. 

Example 1. “skip three characters, remove one character” will be represented by 
“-c” and “Ra” instructions. We will use shorter record “-cRa” for the complete 
P-command, i.e. patch(’ABC’,’Rv-aRs’)=’sBv’, patch(’AB’,’IaIbDb’)=’ba’. 



3 Building P-Commands 

Our current implementation [5] uses a naive algorithm^ that builds P-commands. 
A speedup can be achieved with more powerful algorithms, see [2], this is not 
aim of this paper. 

Definition 1 . (Levenstein [ 6 ]) Let G = {N, E,m,w) = (N,E) be a directed 
graph for two strings X = X 1 X 2 . . . X\x\ Y = V 1 Y 2 ■ ■ ■ Y\y\> where N = 
{[i,j]\ (0 < f < \X\) A (0 < j < |F|)}, E = {([ii, ji]; [^202]) < h ~ h < 

1)A(0<J2-Ji < 1) A {[ii,jiUi 2 ,j 2 ] & N)}, Z = [0,0] andK=[\X\,\Y\]. 

Painting function m: E — >■ PP — command is given Ve = ([*i,ji]; [*202]) G E 
by m(e) =Substitution -t^df ((j/2 = J/i + 1) A {x 2 = a;i -I- 1) A {X^^ yf ^^2))/ 
m(e) =NOP -t^df ((2/2 = 2/1 + 1) A {x 2 = -I- 1) A {X^^ = Yy^)); m{e) = Removal 

^df ((a;i = X2)A{v2 = yi + l));m{e) = Insertion ^df ((2/1 = y 2 )A{x 2 = a:i + l)). 

The penalty (weight) of path P = {pi,P 2 , ■ ■ -Pn} from Z to K is the cost of the 
path, described by w{P) = w(jn{pi)). 



P-command is represented by a path through a directed graph G = (N, E) . For 
example, the graph of transformation ’ABCD’ to ’BDD’ is presented in Fig. 1. We 
search for the shortest weighted path from Z G N to K G N. Any path from Z 
to K represents one P-command that transforms term ’ABCD’ to ’BDD’ (in case 
of Fig. 1). We would also describe edges of the figure: horizontal edges represent 
PP-command ’Removal’, vertical edges - ’Insertion’, solid diagonal edges - ’Sub- 
stitution’ and the last edges - ’NOP’. Each edge of the shortest weighted path 
through the directed graph can be individually encoded. The coding is rather 
simple and is based on function m and our coding of PP-commands (see above 

^ The number varies between 1 and 32, if we suppose words shorter than 32 characters. 
This is a realistic proposition. 

® Our implementation works with complexity Olhh), where h is the length of the 
original string and I 2 is the length of its final form (stem). There are many algorithms 
that have better complexity. 
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Fig. 1. Directed graph for building P-command that transforms ’ABCD’ to ’BDD’. 



- marks for our basic operations). We define the attribute of the PP-command 
for an edge type m(([a, 6]; [u,u])): ’Substitution’ - attribute is character ’Re- 
moval’ - attribute is number 1; ’Insertion’ - attribute is character ’NOP’ 

- attribute is number 1. Sometimes we can optimalize a block (sequence) of 
PP-commands ’Removal’ or ’NOP’, i.e. ’-a-a-a’ will be rewritten to ’-c’ by our 
algorithm we use. 

Our tests showed, that it is profitable, when we accept the path with more 
’Removal’ edges (iff w{' Removal') is cheap). For complete results, see Table 1. 
We would also choose the path that starts with a longer ’Removal’ instruction. 



Table 1. Number of P-commands for various weighting. 



wC Removal') 


w{' Insertion') 


w{' Substitution') 


wi'NOP') 


^P-commands 


1 


1 


1 


0 


145 


1 


1 


2 


0 


149 


1 


2 


2 


1 


162 


1 


3 


1 


0 


161 


0 


1 


2 


0 


149 


2 


3 


2 


0 


175 



Example 2. A word ’momww’ is transformed to its stem by P-commands ’Da’ 
or ’-aDa’. The first command is more universal and there is a chance we will 
use it in another transformation. If we vote for the first command, the set of 
P-commands can be smaller. 



4 Data Structure 

The dictionary (words and their P-commands"^) will be stored in a trie. The 
key of that trie is an original word, see Fig. 2, and P-commands are stored in 

^ Now, the dictionary of words and their stems will be stored as the dictionary of 
words and relevant P-commands. This only changes representation of the dictionary 
and helps us with reduction of data structure. 
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lemma = act 

command = remove last character 
lemma = arm 

command = remove last character 



Fig. 2. Trie. Dictionary representation. 



leaf nodes. The goal of this section is the reduction of that trie. We provided 
tests with an English dictionary of 25802 words (14760 stems). The dictionary 
is available on homepage of JAVA search engine [5]. Original pure data structure 
of the trie takes 85170 nodes. There are 140 P-commands. We mentioned 145 
P-commands in the previous section, but they were calculated by a case sensitive 
algorithm that will not be used for practical reasons. 

The trie can find a P-command for each word^ of length I and then we can 
create a stem with (time) complexity 0{l). There are two main disadvantages. 
First, the data structure of the trie is huge. One node takes around 3 bytes in 
memory and it leads to final 250kB of the trie object. Second, we cannot run 
stemming against a term that was not in the original dictionary. The reductions 
(see below) would solve these two problems. 

We will suppose that the trie is implemented by matrix Mmxn, where m is the 
number of inner nodes and n is the number of characters (that appear in keys). 
Each inner node u is represented by the array of edges that start in that 
inner node. The cell contains two values: pointer rf to the next inner node 
{Mij.rf) and P-command com (Mij.com). The cell b = represents an edge 
e = (m, V, o) of character o from inner node u to node v = b.rf; or an leaf node 
that holds value b.com (the situation is clarified by Fig. 3). 



a!i!e|o|u|m|n|s|v|j|---| index 



b I I I I I I u 




Fig. 3. Data structure of a trie. 



Definition 2. Let u and v be vectors. R(u,v) represents a joint- eollision of 
vectors u and v, R(u, ?;) = -j” ■ When R(u, v) = true, then joint of u and 



V produces a collision. 



® Word of original dictionary. 
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Our reduction algorithms use simple merging strategy. They merge rows of a 
matrix, if it does not create a collision. The term ’collision’ will be defined later, 
because it differs in the algorithms. 

The algorithm will not produce the smallest trie, but our tests showed us that 
the final trie is good enough for our purposes. The main strategy is summarized 
in the next algorithm. 

Algorithm (Reduction algorithm) 

1. read matrix M. 

2. for j = m to 2 do begin 

3. r = Mj 

4. for i = j — 1 to 1 do 

5. if we can merge (or rather join) Mi and r without a collision then begin 

6. join Mi and r: Mi = join{Mi,r), remove the row r 

7. repair references to the row r (they must point to Mi) 

8. end 

9. end 

10. reorder rows of matrix M, because the steps (6-7) produce gaps 

11. store matrix M. 



Definition 3. Let x and y be vectors {{com;rf))i. Then join{x,y) = z, where 
z is a vector {{com;rf))i: 



Zi- field 



Xi- field iff Xi-field yf null 
yi- field iff Xi. field = null 



, field G {com; r/}. 



Basic Reduction. The intuition behind the method of joining two rows without 
any collision is naive. We will join two rows (arrays), if they are equal: R{u, v) = 
^ /\i(ui = Vi). That method mainly joins only leafs of an original trie, because 
the probability of a collision grows up when we move from leafs to rich inner 
nodes. It is obvious that the ’learning’ factor will not be so high, but we would 
like to mention the method here, because it shows us the worse solution. The 
question is to what degree, if any, we can improve this method. 



Tight Reduction. The intuition behind this method is also very simple. We 
required a perfect harmony of two rows in the previous method. Now, we will al- 
low a joint, when there is no collision in not-null values of the rows. For example, 
a trie contains two rows (inner nodes) u and v. There are two edges: from u to g 
with a character c„, and the second one from v to h with a character Cy. There is 
no collision if c„ yf c„. The basic reduction would require {cy = Cy /\ g = h). But 
now, a collision becomes: R{u,v) = /\i(ui = Vi\/ ~'{ui tOcom Vi V Ui Vi)), 
where Ui txifieid Vi -t^df {ui. field yf null A Vi. field yf null A Ui. field yf Vi. field). 
It is obvious that we lost some information, i.e. we will not be able to recognize 
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words of the original dictionary. On the other hand, we can produce stems of 
words that we did not see. The walk through a trie must be modified in case 
of a trie after tight reduction. Because the reduction moves values from leafs to 
inner nodes we must handle that situation. We solved the problem with naive 
modification of a trie algorithm. By default, we go through trie and if we stop 
in a leaf, the leaf value is returned. If we cannot return that value (i.e. next step 
is not defined in a trie), last value that we met in an inner node of a trie is 
returned. 



Multilevel Trie. It seems, that we can generate smaller tries, when we have a 
small set of P-commands. One would also recommend two level algorithm. The 
first level would recognize prefixes and the second one would recognize suffixes. 
Our P-commands can test these two approaches, because the commands can 
be easily disassembled. The first PP-commands of P-commands will form the 
first group (level 1), the second ones will form level 2, and so on. Our approach 
is stronger than the prefix/suffix recognition (see above), because we allow to 
disassemble the prefixes and suffixes to their atomic forms. In each level we can 
try and test the reduction methods (basic and tight). 



5 Results 

We use an English dictionary that contains 14760 stems of 25802 words (avail- 
able on JAVA search engine homepage [5]). The dictionary was converted to 
lowercase. We also prefered ’Removal’ over ’NOP’ and we operated with 140 P- 
commands. The original trie (without reduction) contained 85170 nodes (60097 
inner nodes). The dictionary generated 29 collisions, because 29 words were 
transformed to more than one stem. 

We have also tried two tries. The first one reads the key left to right - it is a 
classic trie, we call it ’forward trie’. And the second one reads the key right to 
left (the key is read in reverse order), we call it ’back trie’. The second variant 
is suitable for a language where we suppose more suffixes (i.e. English). Results 
of each method (in this paper) are presented in Table 2. More detailed view of 
multilevel reduction is presented in Table 3. 



Table 2. Results (number of nodes/number of inner nodes) of methods in this paper. 



Method 


Forward trie 


Back trie 


original trie 


85170/60097 


72551/50592 


basic redaction 


26716/11895 


37725/17401 


tight reduction 


14431/1307 


12714/977 


multilevel, basic 


194013/139310 


159147/111158 


multilevel, tight 


28377/2310 


28807/2053 
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Table 3. Number of nodes (total/inner nodes) is presented for each level (after BR - 
basic reduction, and TR - tight reduction) of multilevel trie. 



Trie level 


BR-Forward 


BR-Back 


Trie level 


TR-Forward 


TR-Back 


1 


85170/60097 


72551/50592 


1 


14216/1163 


13783/930 


2 


85156/60085 


72485/50537 


2 


12174/937 


13070/906 


3 


22080/17839 


12904/9089 


3 


1551/131 


1615/153 


4 


1048/834 


782/605 


4 


293/47 


178/32 


5 


416/334 


280/212 


5 


72/14 


91/16 


6 


74/63 


69/58 


6 


33/10 


29/8 


7 


45/38 


52/45 


7 


22/4 


25/4 


8 


12/10 


12/10 


8 


8/2 


8/2 


9 


12/10 


12/10 


9 


8/2 


8/2 


total 


194013/139310 


159147/111158 


total 


28377/2310 


28807/2053 



The point is, that we can transform any word of the original dictionary to its 
stem without an error (there is a problem of noise, or rather collisions, but we 
omit that cases). How does the method work when we did not operate with a 
complete dictionary? That is the question we must answer. 

Note 2. We do not use cross-validation with other methods, because we could 
not guarantee input of valid terms for testing. If we can do so, we can also 
append them to our dictionary. In this case our method would not return wrong 
results (errors), because any trie after reduction (see above) still resolves the 
words from the original dictionary. On the other hand we could test speed of 
the methods, but from theoretical point of view we know, that the complexity 
of our algorithms is linear (when the maximal length of stems is constant). 



Error- Rate. The test was made with an English dictionary and our best meth- 
ods (back trie) . Some words of the dictionary were hidden, but the test was made 
against full dictionary, see Table 4. For example, when we built a trie, we did not 
accept 10% of words (random selection) that we could read in a dictionary. But 
we used them for testing of error-rate. The table presents approx, values of 100 
runs. We would like to mention that our tests cannot cover all real situations in 
DIS. Another (and better) method of testing would be more complex and that 
is why it cannot be introduced in the size of this paper. 



Table 4. Error-rate of back trie. 



Method 


tight reduction 


basic reduction 


multilevel tight | 


unknown words in input (%) 


10 


15 


20 


30 


10 


15 


20 


30 


10 


15 


20 


30 


error-rate (%) 


3 


5 


7 


11 


10 


14 


20 


30 


5 


7 


10 


15 


#P-commands 


134 


127 


128 


119 


133 


129 


127 


123 


133 


129 


128 


125 
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Note 3. (Table 4) A dictionary has W words. We hide p (%) of words when 
the trie is built (unknown words in input). The set of W ■ {1 — p) words is 
transformed to a set of stems by a set of P-commands (^^(^P-commands) . That 
trie is optimalized by our methods. Then we test the trie (after reduction) with 
the full set of W words. It is obvious that we produce C stems wrong (C < W-p). 
Then the error-rate becomes: • 100%. 



Conclusions 

Our major intention of this research paper was to provide an universal stemming 
technique that can be easily implemented in JAVA. The data structures we used 
are static and small in size and they can be used in multithreaded environment. 
In combination, the methods we have described allow two important resources - 
processing time and memory space to be simultaneously reduced. 

Another important feature is the semi-automatic processing of any language. We 
do not need to study the language, because the methods we presented here can 
learn common prefixes and suffixes without human being. On the other hand, 
we cannot say, that the process is automatic, because the methods are based 
on the dictionary of words and stems that must be prepared by a linguist. The 
good point is, that the dictionary need not be perfect and rich (see error-rate). 
Furthermore, we have also shown that the algorithm that removes suffixes and 
prefixes in two rounds does not save more memory space (see multilevel trie), 
and that the algorithm can have an opposite effect. 

The concept of semi-automatic stemming is being integrated into the existing 
DIR system in JAVA (Perestroika engine [5]). The implementation of the engine 
shows us, that there are some open questions that we did not describe in that 
paper. Can we realize a reduction algorithm, that is optimal and fast? Can we 
achieve better results when we accept only frequent P-commands? 
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Abstract. The reconstruction of discrete two-dimensional pictures from 
their projection is one of the central problems in the areas of medical di- 
agnostics, computer-aided tomography, pattern recognition, image pro- 
cessing, and data compression. In this note, we determine the computa- 
tional complexity of the problem of reconstruction of polyominoes from 
their approximately orthogonal projections. We will prove that it is NP- 
complete if we reconstruct polyominoes, horizontal convex polyominoes 
and vertical convex polyominoes. Moreover we will give the polynomial 
algorithm for the reconstruction of hv-convex polyominoes that has time 
complexity 0{m^n^). 



1 Introduction 

1.1 Definition of Problem 

A finite binary picture is an m x n matrix of O’s and I’s, when the I’s correspond 
to black pixels and the O’s correspond to white pixels. The t-th row projection and 
the j-th column projection are the numbers of I’s in the t-th row and of I’s in the 
j-th column, respectively. In a reconstruction problem, we are given two vectors 
H = {hi, . . . , hm) G {1, • ■ • , tt}™ and V = {vi, . . . , Vn) G {1, . ■ . , m}”, and we 
want to decide whether there exists a picture which the z-th row projection 
equals hi and which j-th column projection equals Vj. 

Often, we consider several additional properties like symmetry, connectivity 
or convexity. In this paper, we consider three properties: 

horizontal convex {h-convex) — in every row the I’s form an interval, 
vertical convex (v-convex) — in every column the I’s form an interval, and 

connected — the set of I’s is connected with respect to the adjacency relation, 
where every pixel is adjacent to its two vertical neighbours and to its two 
horizontal neighbours. 

A connected pattern is called a polyomino. A pattern is hv-convex if it is both 
h-convex and v-convex. 

* Supported by KBN grant 8 TllC 043 19 
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In this paper we solve the problem (Woeginger [5]) whether there exists a 
polynomial time algorithm that takes as an input a horizontal projection vector 
H € IR™ and a vertical projection vector V G M", and which outputs a poly- 
omino whose projections JI* G n}™ and V* G n}™ approximate 

the vectors JI and V, respectively. We consider two notions of “approximate” 

(1) every component of (H,V) differs by at most one from the corresponding 
component of (H*,V*), i.e. we select only the nearest positive integers (we 
call this version the approximation with the absolute error), and 

(2) for every hi and Vj it is \hi — h*\ < log{hi + 1) and \vj — w*| < log(wj + 1) 
for i = 1, . . . , TO and j = 1, . . . , n {the approximation with the logarithmic 
error) . 

The algorithm outputs “NO” if there does not exist a polyomino with approxi- 
mate projections V and H . 



1.2 Known Results 

First Ryser [6], and subsequently Chang [2] and Wang [7] studied the existence of 
a pattern satisfying orthogonal projections {H, V) in the class of sets without any 
conditions. They showed that the decision problem can be solved in time 0{mn). 
These authors also developed some algorithms that reconstruct the pattern from 
{H,V). ^ 

Woeginger [8] proved that the reconstruction problem in the class of poly- 
ominoes is an NP-complete problem. Barcucci, Del Lungo, Nivat, Pinzani [1] 
showed that the reconstruction problem is also NP-complete in the class of h- 
convex polyominoes and in the class v-convex polyominoes. 

The first algorithm that establishes the existence of an hv-convex polyo- 
mino satisfying a pair of assigned vectors {H,V) in polynomial time was de- 
scribed by Barcucci et al. in [1]. Its time complexity is O(TO^n^) and it is rather 
slow. G§bala [4] showed the faster version of this algorithm with complexity 
0(min(TO^,n^) • mnlogmn). The latest algorithm described by Chrobak and 
Diirr in [3] reconstructs the hv-convex polyomino from orthogonal projection in 
time 0(min(TO^,n^) • mn). 

All above results concern to the reconstruction polyominoes from exact or- 
thogonal projections (RPfOP). 

1.3 Our Results 

In this paper we study complexity of the problem of reconstruction polyominoes 
from approximately orthogonal projections (RPfAOP). In section 2 we prove 
that RPfAOP (for both kinds of errors) is NP-complete in the classes of (1) 
polyominoes, (2) horizontal convex polyominoes and (3) vertical convex polyo- 
minoes. In section 3 we show that RPfAOP, for an arbitrary chosen function of 
error, is in P for the class of hv-convex polyominoes. We describe an algorithm 
that solves this problem and has complexity O(m^n^). 
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2 Hardness 

In this section we show the reduction from the problem of reconstruction of 
polyominoes from exact orthogonal projections (RPfOP) to the problem of re- 
construction of polyominoes from approximately orthogonal projections. Let 

V = {ui, e {1, . . . ,m}” 

be an instance of RPfOP problem. Moreover we assume that 

m n 

i=i j=i 

otherwise the polyomino with the projections {H, V) does not exist. By this 
instance, we will construct a row vector H = {hi, . . . , hm} G IR+ and a col- 
umn vector V = [vi, . . . ,Vn} G H" adequate to the notion of error. For the 
approximation with the absolute error we fix 

Vj=Vj + ^, j = l,...,n. 

And for the logarithmic approximation we choose hi such that 



hi< hi + log(/ij -I- 1) < /li -I- 1, 



and Vj such that 

- 1 < - l0g(fj + 1) < Vj. 

The choice is always possible because functions x — log(x -I- 1) and x + log(x -I- 1) 
are continuous and strictly increasing surjections on IR+. 

Now we can solve the RPfAOP problem for projections {H, V). 

Lemma 1. If there exists a polyomino P with row projections H* G n}™ 

and column projections P* G {1, . . . , m}", such that {H* , V*) is the approxima- 
tion with the absolute (logarithmic) error of{H, V), then there exists a polyomino 
with projections {H,V). 

Proof. For the polyomino P the following properties hold 

(i) h* = V* (the sums are equal to number of I’s in polyomino P), and 

(ii) \h* — hi\<l and \v* —Vj\ < 1 for the absolute error ( \h* — hi\ < log(/ii-|-l) 
and \v* — Vj\ < log(vj -I- 1) for the logarithmic error). 
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But from the definition of {H, V) the above properties occur if and only if for 
all i we have that h* is equal to the maximal admissible value, i.e. 

K = + l\ =h {h* = Ihi + log(/ii + 1)J = ), 

and for all j we have that v* is equal to the minimal admissible value, i.e. 

- 11 = Vj ( V* = \Vj- log{Vj + 1)1 =Vj). 

Therefore P also satisfies {H,V). □ 



Lemma 2. If there exists a polyomino P with projections (H,V), then there 
exists a polyomino with row projections H* G {!,..., n}™ and column projections 
V* G {!,..., m}", such that is the approximation with the absolute 

(logarithmic) error of{H,V). 

Proof. From definition of {H, V) (for both kinds of approximation) we have that 
every component of {H, V) can be rounded to the corresponding component of 
{H, V). Hence P is also realisation of {H, V). □ 

Because we know that RPfOP for polyominoes, h-convex polyominoes and 
v-convex polyominoes is NP-complete (see [1] and [8]) we obtain from Lemma 1 
and Lemma 2 the following result 

Theorem 1. The reconstruction of polyominoes, h-convex polyominoes and v- 
convex polyominoes from their approximately orthogonal projections is NP-com- 
plete. □ 

3 hv-Convex Polyomino 

In this section we use some ideas and notations from Chrobak and Diirr [3] 
while describing the algorithm for reconstruction hv-convex polyominoes from 
approximately orthogonal projection. 

In the algorithm described below we generalise the error of approximation 
and assume that it is in the form of a function /. We assume that the function / 
is positive on IR+. For example, the absolute error is a constant function equal 1 
(f{x) = 1) and the logarithmic error is a logarithmic function (f{x) = log(a;+l)). 
First we define some auxiliary expressions: 

Vj = max{l, [vj — f{vj)~\} and Vj = min{m, [vj + f{vj)\} 

for j = 1, . . . , n, and 

/ij = max{l, |"/ij - /(/ij)]} and = Tain{n, [hi -\- f (hi) \} 
for i = 1, . . . , TO. 
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Fig. 1. The convex polyomino P that is anchored at (p, q, r, s), with corner regions A, 
B, C and D 



These expressions have following properties: hi (vj) is the minimal positive 
integer value and hi {vj) is the maximal positive integer value for an horizontal 
(vertical) projection that differs by at most f{hi) {f{vj)) from hi (vj). 

Let (z, j) denote the cell of matrix that is in the z-th row and the j-th column. 
For an hv-convex polyomino an object A is called an upper-left corner region (see 
Fig.l) if (z + 1, j) G A or {i,j + 1) G A implies (i,j) G A. In an analogous way 
we can define other corner regions. By P we denote the complement of P. 

From the definition of hv-convex polyominoes we have the following lemma 

Lemma 3 (Chrobak and Diirr [3]). P is an hv-convex polyomino if and only 

if 

P = AU BUCU D, 

where A, B, C, D are disjoint corner regions (upper-left, upper-right, lower-left 
and lower-right, respectively) such that 

(i) (z — 1, j — 1) G A implies (i,j) ^ D, and 

(a) (z — 1, j -I- 1) G B implies (i,j) ^ C. □ 

We say that the hv-convex polyomino P is anchored at (p, q, r, s) if cells 
(l,p), (q, n), (m, r), (s, 1) G P (i.e. these cells do not belong to any corner region). 

The main idea of our algorithm is, given (H,V) - vectors of approximately 
orthogonal projections, to construct a 2SAT expression Fp^g^r,s{H,V) with the 
property that Fp,q^r,s{H, V) is satisfiable if and only if there exists an hv-convex 
polyomino with projections that is anchored at (p,q,r,s) and every 

component of (H,V) differs by at most the value of function / from the corre- 
sponding component of {H*,V*). 

Fp^q^r,s{H,V) consists of several sets of clauses, each set expressing a cer- 
tain property: “Corners” (Cor), “Connectivity” (Con), “Anchors” (Anc), “Lower 
bound on column sums” (LBC), “Upper bound on column sums” (UBC), “Lower 
bound on row sums” (LBR) and “Upper bound on row sums” (UBR). 
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(LBC) assigns the minimal distance between corner regions for columns (for 
j-th column it is equal to vj). (UBC) assigns the maximal distance between 
corner regions for columns (for j-th column it is equal to Vj). (LBR) and (UBR) 
are analogous for rows. Now we define a 2SAT formula 

Fp,q,r,s{H, V) = Cor A Con A AnCp^q^r,s A LBC A UBCp^r A LBR A UBRq g. 

All literals with indices outside the set m} x {1, . . . , n} are assumed to 

have value 1. 

Now we give the algorithm of reconstruction. 

Algorithm 

Input: H G M™, R G M” 

FOR p,r = 1, . . . ,n AND g, s = 1, . . . , m DO 
IF Fp,q,r,s{H, V) is satisfiable 
THEN RETURN P = AUBUCUD AND HALT 
RETURN “NO” 



Theorem 2. Fp q j.,s{H,V) is satisfiable if and only if P is an hv-convex polyo- 
mino with projections that is anchored at (p,q,r,s) and every compo- 

nent of {H* , V*) differs from the correspondent component of {H, V) by at most 
the value of function f for this component. 

Proof. (<^=) If P is an hv-convex polyomino with properties like in the theorem, 
then let A, B, C, D be the corner regions from Lemma 3. By Lemma 3, A, B, 
C, D satisfy conditions (Cor) and (Con). Condition (Anc) is true because P is 
anchored at {p, q, r, s). Moreover for alH = 1, . . . , m we have \h* — hi\ < f{hi) and 
h* G IN, hence hi < h* < hi and conditions (LBR) and (UBR) hold. Analogous, 
conditions (LBC) and (UBC) hold for vertical projections. 

(^) If Fp^q^r,s{H,V) is satisfiable, take P = A\J B \J C U D. Conditions 
(Cor), (Con), (LBC) and (LBR) imply that the sets A, B, C, D satisfy Lemma 
3 ((LBC) and (LBR) guarantee disjointness of A, B, C, D), and thus P is an hv- 
convex polyomino. Also, by (Anc), P is anchored at {p, q, r, s). Conditions (LBR) 
and (UBR) imply that hi < h* < hi for each row i. Hence \h* — hi\ < f{hi). 
Analogous, conditions (LBC) and (UBC) imply that Vj < v* < Vj for each col- 
umn j and therefore \v* — vj\ < f{vj). Moreover because P is the finite set we 
have h* = v*. Hence P must be an hv-convex polyomino with approxi- 
mately orthogonal projections {H, V) with respect to function /. □ 

Each formula Fp,q,r,s(H, V) has size 0(mn) and can be computed in the linear 
time. Since a 2SAT formula can also be solved in the linear time, we obtain the 
following result 

Theorem 3. The problem of reconstruction of hv-convex polyominoes from ap- 
proximately orthogonal projections can be solved in time 0{rrAn^). □ 
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Abstract. Lamport’s Bakery algorithm is among the best known mu- 
tual exclusion algorithms. A drawback of Lamport’s algorithm is that 
it requires unbounded registers for communication among processes. By 
making a small modification to Lamport’s algorithm, we remove the need 
for unbounded registers. The main appeal of our algorithm lies in the fact 
that it overcomes a drawback of a famous algorithm while preserving its 
elegance. 



1 Introduction 

Mutual Exclusion is a classic synchronization problem that has been extensively 
studied (see [5] for a survey of mutual exclusion algorithms). This problem is 
described as follows. There are n asynchronous processes with each process re- 
peatedly performing four sections of code: remainder section, entry section, crit- 
ical section and exit section. It is assumed that no process fails in the entry or 
exit sections and every process that enters the critical section eventually leaves 
it. The problem is to design a protocol for entry and exit sections that satisfies 
the following properties: 

Mutual Exclusion: No two processes are in the critical section at the same 
time. 

Starvation freedom: Each process in the entry section eventually enters 
the critical section. 

Wait- free exit: Each process can complete the exit section in a bounded 
number of steps, regardless of the speeds of other processes. 

The following fairness property is also desirable in any mutual exclusion 
protocol: 

Doorway FIFO: The entry section begins with a straight line code called 
the wait- free doorway, a process can execute the doorway in a bounded 
number of steps, regardless of the speeds of other processes. If process Pi 
completes executing the doorway before process Pj begins executing the 
doorway. Pi enters the critical section before Pj. 

L. Pacholski and P. Ruzicka (Eds.): SOFSEM 2001, LNCS 2234, pp. 261-270, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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Lamport’s Bakery algorithm is among the best known mutual exclusion al- 
gorithms [2]. It is discussed in introductory Operating Systems textbooks [6] 
and in books on Concurrent Algorithms [l,4j5] and is well-known to the general 
Computer Science community. The algorithm satisfies all of the above proper- 
ties, is elegant and has an intuitive appeal: processes assign themselves tokens in 
the doorway, and enter the critical section in the order of their token numbers. 
One drawback of Lamport’s algorithm is that it requires unbounded registers for 
communication among processes. It is this drawback that our paper overcomes: 
with a simple modification, we show how Lamport’s Bakery algorithm can be 
made to work with small bounded registers. Our algorithm adds just two lines 
and makes a small change to an existing line. Lamport’s algorithm and our al- 
gorithm are presented in Figures 1 and 2, respectively. (Lines 3 and 8 in Figure 
2 are new and line 4 is a slightly modified version of the corresponding line in 
Figure 1.) 

Our algorithm compares with Lamport’s algorithm as follows: 

Properties: Our algorithm, like Lamport’s algorithm, has all properties 
stated above, including the doorway FIFO property. 

Size of registers: Lamport’s algorithm requires unbounded registers while 
ours requires bounded registers, each of size either 1 bit or log 2n bits. This 
is the highlight of our algorithm. 

Type of registers: Lamport’s algorithm requires 2n single-writer multi- 
reader registers. Our algorithm requires the corresponding 2n single-writer 
multi-reader registers and an additional register that we call X . The register 
X is written by the process in the critical section. Therefore, there is at 
most one write operation on X at any time. Yet, strictly speaking, it is not a 
single-writer register because different processes write to it (although never 
concurrently) . 

Lamport’s algorithm works even if registers are only safe, but our algorithm 
requires atomic registers. 

Algorithms with better space complexity than ours are known: to our knowl- 
edge, the best algorithm, from the point of view of size of registers, is due to 
Lycklama and Hadzilacos [3]. Their algorithm has all properties stated above 
and requires only 5n safe boolean registers. Thus, the appeal of our algorithm 
is not due to its low space complexity, but because it shows that a small modi- 
fication removes a well-known drawback of a famous algorithm while preserving 
its elegance. 

1.1 Organization 

Lamport’s Bakery algorithm (Figure 1), which employs unbounded token values, 
is the starting point of our work and is described in Section 2. Our approach 
to bounding token values requires these values to be clustered. In Section 3, 
we show that Lamport’s algorithm does not have the clustering property. In 
Section 4, we present a modification of Lamport’s algorithm that continues to 
use unbounded tokens, but has the clustering property. Then, in Section 5, we 
present a bounded version of this algorithm. 
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2 Lamport’s Algorithm 



[initialize Vi : 1 < i < n : {gettokerii false 
tokerii ■.= —1)1 

1. gettokerii := true 

2. S ~ {token j\l < j < n} 

3. tokeui := max{S) + 1 

4. gettokeni := false 

for j € { 1 • • • n} — {i} 

5. wait till gettokeuj = false 

6. wait till {tokenj = —1) V {[tokeni,i)] < [tokenj , j]) 

7. CS 

8. tokeUi := —1 



Fig. 1. Lamport’s Bakery algorithm : Pi's protocol 



We first describe Lamport’s Bakery algorithm (Figure 1). The algorithm is 
based on the following idea. When a process gets into the entry section, it assigns 
itself a token that is bigger than the existing tokens. It then waits for its turn, 
i.e., until every process with a smaller token has exited the critical section (CS). 

Each process Pi maintains two variables — tokerii and gettokerii. The purpose 
of tokerii is for Pi to store its token number. A value of -1 in tokerii indicates 
that Pi is either in the remainder section or in the process of assigning itself a 
token. The variable gettokerii is a boolean flag. Pi sets it to true when it is in 
the process of assigning itself a token. 

We now informally describe the actual lines of the algorithm. Pi sets gettokerii 
to true to signal that it is in the process of assigning itself a token (line 1). It 
then reads the tokens held by all other processes (line 2) and assigns itself a 
bigger token (line 3). Notice that if another process Pj is also computing a token 
concurrently with Pi, both Pi and Pj may end up with the same token number. 
Therefore, when comparing tokens (line 6), if the token values of two processes 
are equal, the process ID is used to decide which token should be considered 
smaller. Having set its token. Pi sets the flag gettokerii to false (line 4). Lines 
1-4 constitute the doorway. 

Pi then considers each process Pj in turn and waits until it has “higher 
priority” than Pj, i.e., until it is certain that Pj does not have (and will not 
later have) a smaller token than Pfs. This is implemented on lines 5 and 6, and 
the intuition is described in the rest of this paragraph. Essentially, Pi needs to 
learn Pj's token value to determine their relative priority. Pi can learn Pj's token 
value by reading tokerij except in one case: Pi cannot rely on tokerij when Pj is 
in the process of updating it (i.e., Pj is on lines 2 or 3). This is because, when 
Pj is on lines 2 or 3, even though the value of tokerij is -1, Pj may be about to 
write into tokerij a smaller value than Pfs token. 
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Specifically, for each Pj, Pi first waits until Pj’s flag {gettokerij) is false (line 
5), i.e., until Pj is not in the process of assigning itself a token. After this wait, 
Pi can be certain of the following fact: if Pj assigns itself a new token before 
Pi has exited CS, Pj’s token will be bigger than Pi’s token since (on line 2) Pj 
will surely read tokerii. Pi then waits until either Pj is in the remainder section 
or Pj's token is bigger than its token (line 6). At the end of this wait. Pi can 
be certain that it has a higher priority than Pj: if Pj is in the entry section or 
will enter the entry section, Pj’s token will be bigger than Pi’s and hence Pj 
will be forced to wait at line 6 until Pi has exited CS. Thus, when the for-loop 
terminates. Pi is certain that it has higher priority than all other processes. So 
it enters CS (line 7). When it exits CS, it sets tokerii to -1 to indicate that is in 
the remainder section. 

3 Our Approach to Bounding Token Values 

In Lamport’s algorithm, let maxt and mint denote the maximum and mini- 
mum non-negative token values at time t (they are undefined if all tokens are 
-1). Let rangct be maxt — mint- Lamport’s algorithm is an unbounded algo- 
rithm because maxt can increase without bound. What is more pertinent for 
our purpose of bounding tokens, however, is the fact that ranget can also in- 
crease without bound. In Section 3.1 we substantiate this claim that ranget can 
grow unbounded. Our approach to deriving a bounded algorithm depends on 
limiting the growth of ranget. This approach is described in Section 3.2. 

3.1 Unbounded Separation of Tokens in Lamport’s Algorithm 

Below we describe a scenario to show that the value of ranget can increase 
without bound. Consider a system of two processes Pi, P 2 in the following state: 
Both Pi and P 2 have just completed executing line 4. The values of tokeni, 
token 2 are respectively either 0, v (for some w > 0) or v, 0. The range is therefore 

V. 



— Pi observes that gettoken 2 = false (line 5). P 2 observes that gettokeni = 
false (line 5). 

— If [tokeni, 1] < [token 2 , 2 ], let Pi^ = Pi, Pis = P 2 - Otherwise, let Pi^ = 
P 2 ,Pi 2 = Pi- Note that the value of tokeni^ is v. Pi., executes line 6, enters 
and exits CS, and writes —1 in tokeni,. Pit then begins a new invocation 
of the protocol. Pi, reads v in tokeni.^, computes u -I- 1 as the new value for 
tokeni, ■ Pii stops just before writing u -I- 1 in tokeni, ■ 

— Pi^ executes line 6, enters and exits CS, and writes —1 in tokeni.,. Pi^ then 
begins a new invocation of the protocol. Pi, reads —1 in tokeni,, and writes 
0 in tokeni,. P 12 then executes line 4. 

— Pi, writes w -|- 1 in tokeui, and executes line 4. The range is now v -I- 1. The 
system state now is identical to the system state at the beginning of the 
scenario, except that the range has increased by 1. 
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The above scenario took us from a system state with range f to a system 
state with range + 1. This scenario can be repeated arbitrarily many times, 
causing the range to increase without bound. 

3.2 The Main Idea 

Our approach to bounding the token values in Lamport’s algorithm consists of 
two steps. In the first step, we introduce a mechanism that forces the token 
values to form a narrow cluster, thereby ensuring a small bounded value for 
ranget- The resulting algorithm, which we call UB-Bakery, still uses unbounded 
tokens, but the tokens are clustered. In the second step, we observe that the 
token clustering makes it possible to replace integer arithmetic with modulo 
arithmetic. The resulting algorithm, which we call B-Bakery, achieves our goal 
of using small bounded token values. The first step is described section 4 and 
the second step in Section 5. 

4 Algorithm with Unbounded and Clustered Tokens 

To implement the first step described above, we introduce a new shared variable 
X which stores the token value of the latest process to visit CS. The new algo- 
rithm, UB-Bakery, is in Figure 2. It makes two simple modifications to Lamport’s 
algorithm. First, a process writes its token value in X (line 8) immediately before 
it enters CS. The second modification is on lines 3 and 4. In addition to reading 
the tokens of all the processes. Pi also reads X (line 3). Pi then computes its 
new token value as 1-1- (maximum of the values of all processes’ tokens and X). 
No other changes to Lamport’s algorithm are needed. 

To see the usefulness of X, let us revisit the scenario described in the previous 
section. There, Pi^ read the value v in tokerii^, but Pi^ read —1 in tokerii^. Con- 
sequently, while Pjj set its token to u -I- 1, Pi^ set its token to 0. This possibility, 
of one process adopting a large token value and the other a small token value, 
is what causes the range to increase without bound. In the new algorithm, this 
is prevented because, even though Pi^ might read —1 in tokerii-^, it would find v 
in X. As a result, Pi 2 ’s token value would be w -I- 1 also. More importantly, the 
token values of Pi^ and Pi^ would be close to each other (in this scenario, they 
would be the same). 

Intuitively, the new algorithm ensures that X grows monotonically and all 
non-negative token values cluster around X. The exact nature of this clustering 
is stated and proved in the next subsection. 

4.1 Properties of UB-Bakery 

We now state the desirable clustering properties of UB-Bakery (Theorems 2 and 
3), and give proof outlines of how they are achieved. Rigorous proofs of these 
properties are provided in the full version of this paper. 

Observation 1 The value of X is non- decreasing. 
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[initialize X := 0; 

\/i : 1 < i < n : {gettokerii := false 
tokerii ■.= —1) ] 

1. gettokerii ~ true 

2. S ~ {token j\l < j <n\ 

3. X := X 

4. tokeui := max{S U {r}) + 1 

5. gettokeni := false 
for i G { 1 •••«.} — {*} 

6. wait till gettokeuj = false 

7. wait till {tokenj = —1) V {\tokeni,i)] < [tokenj , j]) 

8. X ~ tokeui 

9. CS 

10. tokeUi := —1 



Fig. 2. Algorithms UB-Bakery and B-Bakery : Pi’s protocol 



Proof Sketch: Suppose the value of X decreases. Specifically, let Xi,xj, {xi > Xj) 
be two consecutive values of X, with Xj immediately following Xi. Let Pi,Pj be 
the processes that write Xi,Xj in X respectively. Thus, Pj immediately follows 
Pi in the order of their entering CS. 

If Pj reads tokerii (line 2) after Pi has written Xi in tokerii (line 4), then Pj 
either reads Xi in tokerii (line 2) or reads Xi in X (line 3). Thus, Xj, the value of 
token j computed by Pj (line 4), must be greater than Xi. This contradicts our 
assumption that Xi > Xj. Therefore, Pj reads tokeUi (line 2) before Pi writes 
Xi in tokeui (line 4). Pi must subsequently wait until gettokenj = false (line 
6). Since Pj must first write Xj in tokenj before setting gettokenj to false, Pi 
will read Xj in tokenj when it executes line 7. Since Xi > Xj, Pi will not exit 
its waiting loop w.r.t. Pj (line 7) until Pj has exited CS and set tokenj to —1. 
This contradicts our assumption that Pi enters CS before Pj. This completes 
the proof of Observation 1. 

□ 

Theorem 1 (Bounded token range). At any time t and for any j, let v be 
the value of tokenj and x be the value of X. If v ^ —1, then x < v < x + n. 

Proof Sketch: This Theorem asserts that at any time t, the non-negative token 

values lie in the range [x,x + n], where x is the value of X at t. We now give 
an informal proof of this Theorem. Suppose Theorem 1 is false. If v < x, then 
Pj has not entered CS at time t. Suppose Pj subsequently writes u in X at 
time t' (prior to entering CS). The value of X decreases at some time in \t,t'] 
(because v < x). This violates Observation 1. If v > x -I- n, then there is some 
integer a, where x-l-I < a < x + n, such that a is not the token value of any 
process at t. (This is true for the following reason: Excluding v, there are at 
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most n—1 distinct token values at t, whereas there are n integers in the interval 
[x + l,x + n].) Since v is the value of token j at t, and v > a, then a must have 
been the token value of some process Pk at some time before t, This implies that 
Pfc has written a in Ai at some time t' before t. Since a > x, the value of X 
decreases at some time in the interval [t' , t] . This again violates Observation 1 . 
This completes the proof. 

□ 

Theorem 2 (Token cluster around X). Let v be the value oftokenj read by 
Pi executing line 2. Let x be the value of X read by Pi executing line 3. Then, 
{v yf —1) => X — (n — 1) < X < X + (n — 1). 

Proof Sketch: This Theorem states that any token value v read by Pi on line 2 

lies within the range [x — (n — 1), x + (n — 1)], where x is the value of X read by 
Pi on line 3. We say that the token values v read on line 2 cluster around the 
pivot X. To establish Theorem 2, we first prove two Observations: 

Observation 2 Let ti be the time Pi executes line 1 and O be the time Pi 
executes line 3. Then, any process Pj executes line 8 ( writing the value of token j 
in X) at most once in (^1,0)- 

Proof Sketch: This observation is true for the following reason: Suppose Pj ex- 
ecutes line 8 for the first time in (^ 1 ,^ 2 ); smd then begins a new invocation of 
UB-Bakery. Within the interval (ti, t 2 )> Pj cannot proceed beyond line 6 of the in- 
vocation, where Pj waits for gettokeni to become false (because gettokeni = true 
throughout (^ 1 , 0 ))- 

□ 

Observation 3 Let t\ be the time Pi executes line 1 and O be the time Pi 
executes line 3. Let the value oftokenj be vi and V 2 at ti and t 2 , respectively. 
Let v be the value that Pi reads in tokenj on line 2. Lfv yf —1, then either v = vi 
or V = V 2 - 



Proof Sketch: Suppose v vi. Then Pj must have written v in tokenj at some 
time t, where t\ < t < t 2 - Consider the invocation by Pj during which Pj writes 
v in tokenj. Pj cannot exit its waiting loop on line 6 w.r.t. Pi at any time before 
O (because gettokeni =true throughout (fi,t 2 )). Therefore the value of tokenj 
is v at O, i-e. v = V 2 . 

□ 

Now we continue with the proof sketch of Theorem 2. Theorem 1 establishes 
that the token range is bounded at any time. Further, the token values are 
bounded below by the value of X. Observation 2 implies that the value of X 
increases by no more than (n — 1) in (^1,0)- Observation 3 says that any non- 
negative value of tokenj read by Pi on line 2 is the value of tokenj at either ti or 
0 . Manipulating the inequalities that result from these assertions gives Theorem 
2 . 

□ 
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Theorem 3 (Token cluster around tokerii). Let Vi be the value of tokerii 
(written by Pi on line 4) when Pi is executing line 1. Let Vj be the value of token j 
read by Pi when executing line 1. Lfvj ^ —1, then Vi — {n—l) < Vj < Vi + {n—l). 



Proof Sketch: This Theorem states that all non-negative token values read by 

Pi on line 7 cluster around the pivot tokeui (which is unchanged throughout the 
interval during which Pi executes line 7). The proof proceeds as follows: Suppose 
Pi reads tokenj (line 7) at time t. By contradiction, suppose Vj < Vi — {n — 1) 
(resp. Vi < Vj — (n — 1)) at time t. Then, there is some integer a, Vj < a < Vi 
(resp. Vi < a < Vj), such that a is not the token value of any process at t. (This is 
true for the following reason: There are at most n distinct token values, whereas 
there are more than n integers in the interval [vj,Vi] (resp. [ui,Uj]).) Since Vi 
(resp. Vj) is the value of tokeUi (resp. token j) at t, and Vi > a (resp. Vj > a), 
then a must have been the token value of some process Pk at some time before t. 
Therefore the value of X was a at some time t' before t. Let x be the value of X 
at t. By Theorem 1, a; < Vj (resp. x < Vi). This implies that x < a. This in turn 
implies that the value of X decreases at some time in [t',t], which contradicts 
Observation 1. This completes the proof. 

□ 



5 Algorithm with Bounded Tokens 

In the algorithm UB-Bakery, the values in X and tokeni increase without bound. 
However, by exploiting token clustering (Theorems 2 and 3), it is possible to 
bound these values. This is the topic of this section. 

The main idea is that it is sufficient to maintain the value of X and the 
non-negative values of tokeni modulo 2n — 1. Specifically, let B-Bakery denote 
the Bounded Bakery algorithm whose text is identical to UB-Bakery (Figure 
2), except that the addition operation on line 4 is replaced with addition mod- 
ulo 2n — 1, denoted by ©, and the operators max and < (on lines 4 and 7, 
respectively) are replaced with max and ^ respectively (these new operators 
will be defined shortly). Thus, in B-Bakery, we have X G {0, 1, . . . , 2n — 2} and 
tokeni G {~1, 0, 1, . . . , 2n — 2}. 

Define a function / that maps values that arise in UB-Bakery algorithm to 
values that arise in B-Bakery algorithm as follows: /(—I) = —1 and for all u > 0, 
f{v) = vmod{2n— 1). For a set S, define f{S) as {/(f) | v G S}. 

To define the operators max and -< for B-Bakery, we consider two runs: a 
run R in which processes execute UB-Bakery algorithm and a run R' in which 
processes execute B-Bakery algorithm. It is assumed that the order in which 
processes take steps is the same in R and in R' . Our goal is to define max and 
^ for B-Bakery so that processes behave “analogously” in R and R' . Informally 
this means that the state of each Pi at any point in run R is the same as Pfs 
state at the corresponding point in R'; and the value of each shared variable at 
any point in R is congruent (modulo 2n — 1) to its value at the corresponding 
point in R' . We realize this goal as follows. 
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Definition of max : Consider a process Pi that executes lines 2 and 3 in the 
run R of UB-Bakery algorithm. Let Sub denote the set of token values that 
Pi reads on line 2 and Xub denote the value of X that Pi reads on line 3. Let 
maxub = max {Sub U {xub})- 

Now consider Pi performing the corresponding steps in run R' of B-Bakery. 
Let Sb denote the set of token values that Pi reads on line 2 and Xb denote 
the value of X that Pi reads on line 3. 

If processes behaved analogously in runs R and R' so far, we would have 
Xb = f{xub) and Sb = f{Sub)- Further, since non-negative values in Sub 
are in the interval [xub — {n — l),Xub + {n — 1)] (by Theorem 2), if a and 
b are distinct values in Sub, then /(a) and f{b) would be distinct values 
in Sb- We want to define the operator max so that max(S'h U {a;&}) that Pi 
computes on line 4 is f{maXub)- This requirement is met by the definition 
of max described in the next paragraph. (Let © and 0 be defined as follows: 
a(B b = {a + b) mod {2n — 1) and aQb = {a — b) mod {2n — 1).) 
Non-negative values in Sub he in the interval [xub — {n—l),Xub + {n—l)]- The 
elements in this interval are ordered naturally as x„t,— (n— 1) < (n— 2) < 

. . . < Xub < ■ . ■ < Xub + {n—2) < Xub + {n—l). Correspondingly, non-negative 
values in Sb should be ordered according to Xb Q {n — 1) < Xb Q (n — 2) < 

. . . < Xb < ... < Xb S) (n — 2) < Xb (B (n — 1). Therefore, if we shift all 
non-negative values in Sb U {xb} by adding to them n — 1 — Xb (modulo 
2n — 1), then the smallest possible value Xb Q {n — 1) shifts to 0 and the 
largest possible value Xb (B {n — 1) shifts to 2n — 2. Then, we can take the 
ordinary maximum over the shifted values of Sb U {xb} and then shift that 
maximum back to its original value. More precisely, let max be the ordinary 
maximum over a set of integers. Let T = S'h U {xb} — {~1}- Then, we define 
max(S'f, U {a;&}) = max ({u © (n — 1 — Xb) \ v G T}) © (n — 1 — Xb)- The 
following Lemma states the desired relationship between max and max that 
we have established. 

Lemma 1. Consider a run in which processes, including Pi, execute UB- 
Bakery. Let Sub denote the set of token values that Pi reads on line 2 and 
Xub denote the value of X that Pi reads on line 3. Let Sb = f{Sub) and 
Xb = f{xub)- Then max(5'h U {xb}) = f{max {Sub U {x„&})). 

Definition of Consider a process Pi executing line 7 in run R of UB-Bakery 
algorithm. Let Vj be the value that Pi reads in tokenj and Vi be the value 
of tokeui- 

Now consider the corresponding step of Pi in the run R' of B-Bakery. Let u' 
be the value that Pi reads in token j and v[ be the value of tokeUi- 
If processes behaved analogously in runs R and R' so far, we would have 
v[ = f{vi) and u' = f{vj). By Theorem 3, if Vj yf —1 then vj is in the 
interval [uj — (n — 1) , Uj + (n — 1)] . We want to define A so that [u' , i] A [u' , j] 
holds if and only if [vi,i] < [vj,j] holds. To achieve this, we proceed as 
before by shifting both v{ and u' by adding {n—l — v'f) (modulo 2n — 1) and 
then comparing the shifted values using the ordinary “less than” relation for 
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integers. More precisely, ^ is defined as follows: [u',z] ^ if ^ind only if 

[r:' © (n — 1 — U-), *] < [v'j — The following Lemma states the 

desired relationship between ^ and < that we have established. 

Lemma 2. Consider a run in which processes, including Pi, execute UB- 
Bakery. Let Vj be the value that Pi reads (on line 1) in tokeuj and Vi be the 
value of tokeui when Pi is executing line 1. Let v' = f{vi) and v( = f{vj). 
Then ^ holds if and only if[vi,i] < [vj,j] holds. 

To summarize, B-Bakery, our final algorithm that uses bounded tokens, is 
identical to U B-Bakery (Figure 2) with the operators +, max and < replaced 
with ©, max and © as defined above. The following theorem states our result. 
Formal proof of this theorem is presented in the full version of this paper. 

Theorem 4. The algorithm B-Bakery satisfies the following properties: 

— Mutual Exclusion: No two processes can be simultaneously in CS. 

— Starvation Freedom: Each process that invokes B-Bakery eventually enters 
CS. 

— Doorway FIFO : Let t be the time when Pi writes in tokeUi when executing 
line 4- If Pj initiates an invocation after t, then Pi enters CS before Pj. 

— Bounded Registers: Every shared register is either a boolean or has log 2n 
bits. 
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Abstract. It is common practice to apply linear or nonlinear feature ex- 
traction methods before classification. Usually linear methods are faster 
and simpler than nonlinear ones but an idea successfully employed in 
the nonlinearization of Support Vector Machines permits a simple and 
effective extension of several statistical methods to their nonlinear coun- 
terparts. In this paper we follow this general nonlinearization approach 
in the context of Independent Component Analysis, which is a general 
purpose statistical method for blind source separation and feature ex- 
traction. In addition, nonlinearized formulae are furnished along with an 
illustration of the usefulness of the proposed method as an unsupervised 
feature extractor for the classification of Hungarian phonemes. 

Keywords, feature extraction, kernel methods, Independent Compo- 
nent Analysis, FastICA, phoneme classification 



1 Introduction 

Feature extraction methods, whether in linear or a nonlinear form, produce pre- 
processing transformations of high dimensional input data, which may increase 
the overall performance of classifiers in many real world applications. These algo- 
rithms also permit the restriction of the entire input space to a subspace of lower 
dimensionality. In general, experience has shown that dimensionality reduction 
has a favorable effect on the classification performance, i.e. reducing superfluous 
features which can disturb the goal of separation. 

In this study Independent Component Analysis will be derived in a nonlin- 
earized form, where the method of nonlinearization was performed by employing 
the so-called “kernel-idea” [11]. This notion can be traced back to the poten- 
tial function method [1], and its renewed use in the ubiquitous Support Vector 
Machine [4], [21]. 

Without loss of generality we shall assume that as a realization of multivariate 
random variables, there are m-dimensional real attribute vectors in a compact 
set X over R™ describing objects in a certain domain, and that we have a 
finite m x n sample matrix X = [xi, . . . , x„] containing n random observations. 



L. Pacholski and P. Ruzicka (Eds.): SOFSEM 2001, LNCS 2234, pp. 271-281, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




272 



Andras Kocsor and Janos Csirik 



T 




Fig. 1. The “kernel-idea” . T is the closure of the linear span of the mapped data. The 
dot product in the kernel feature space T is defined implicitly. The dot product of 

Er=l and Er=l is 



The aim of Independent Component Analysis (ICA) is to linearly transform 
the sample matrix X into components that are as independent as possible. The 
definition of independence of the components can be viewed in different ways. In 
[8] and [9] Hyvarinen proposed a new concept and a new method (i.e. FastICA) 
that extends Comon’s information theoretic ICA approach [6] with a new family 
of contrast functions. FastICA is a fast approximate Newton iteration procedure 
(the convergence is at least quadratic) for the optimization of the negentropy 
approximant (see definition later), which serves as a measure for selecting new 
independent components. Fortunately this method can be reexpressed as its 
input is K = X instead of X, where the n x n symmetric matrix K is the 
pairwise combination of dot products of the sample {K = [xj • XjJ^). Now let 
the dot product be implicitly defined (Fig. 1) by the kernel function k in some 
finite or infinite dimensional feature space T with associated transformation (j): 

A:(x,y) = ^(x) • ^(y). (1) 

Going the other way, constructing an appropriate kernel function (i.e. where 
such a function </> exists) is a non-trivial problem, but there are many good 
suggestions about the sorts of kernel functions which might be adopted along 
with some background theory [21], [5]. However, the two most popular kernels 
are the following: 

Polynomial kernel: fci(x,y) = (x • y)^ , d G N, (2) 

Gaussian RBF kernel: fc 2 (x, y) = exp (— jjx — y| p/r) , r S R+. (3) 
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For a given kernel function the pair ((/>, dim^) is not always unique and for the 
kernels k\ and k 2 the following statements hold: 

i) If the dot product is computed as a polynomial kernel, then the dimension 

of the feature space is at least 

a) The dot product using the Gaussian RBF kernel induces infinite dimension 
feature spaces. 

As the input of FastICA is represented only by dot products, matrix K is easily 
redefinable by 

a: = [fc(xi,Xj)],j. (4) 

With this substitution FastICA produces a linear transformation matrix in the 
kernel feature space but now this is no longer a linear transformation of the 
input data owing to the nonlinearity of </>. Still, dot products in T computed 
with kernels offer a fast implicit access to this space that in turn leads to a low 
complexity nonlinear extractor. If we have a low-complexity (perhaps linear) 
kernel function the dot product 4>{xi) ■ </>(xj) can also be computed with fewer 
operations (e.g. 0(m)) whether or not 4>(x) is infinite in dimension. 

Using this general schema various feature extraction methods such as Princi- 
pal Component Analysis (the first generalization right after SVM, proposed by 
Scholkopf et al.) [19], [15], [13], Linear Discriminant Analysis [16], [18], [20] and 
Independent Component Analysis have already been nonlinearized. Hopefully 
other statistical methods will uncover their nonlinear counterparts in the near 
future. 

In the subsequent section we will review the standard Independent Compo- 
nent Analysis and FastICA algorithms. Afterwards we will reformulate ICA in 
such a way that its input is the dot product of the input data, and then this 
basic operation will be replaced by kernel functions. The final part of the paper 
will discuss the results of experiments on the phoneme classification followed by 
some concluding remarks. 

2 Independent Component Analysis 

Independent Component Analysis [6], [8], [9], [7] is a general purpose statistical 
method that originally arose from the study of blind source separation (BSS). 
A typical BSS problem is the cocktail-party problem where several people are 
speaking simultaneously in the same room and several microphones record a 
mixture of speech signals. The task is to separate the voices of different speakers 
using the recorded samples. Another application of ICA is feature extraction, 
where the aim is to linearly transform the input data into uncorrelated com- 
ponents, along which the distribution of the sample set is the least Gaussian. 
The reason for this is that along these directions the data is supposedly easier 
to classify. For optimal selection of the independent directions several contrast 
function were defined using approximately equivalent approaches. Here we fol- 
low the way proposed by Hyvarinen [8], [9], [7]. Generally speaking, we expect 



m + d — 1 
d 
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these functions to be non-negative and have a zero value for a Gaussian distribu- 
tion. Negentropy is a useful measure having just this property, used for assessing 
non-Gaussianity (i.e. the least Gaussianity) . Since obtaining this quantity via its 
definition is computationally rather difficult, a simple easily-computable approx- 
imation is normally employed. The negentropy of a variable 77 with zero mean 
and unit variance is estimated by the formula 

Jaiv) ^ {E{G{v)} - E{G{iy)}f (5) 

where G() : M — >■ K. is an appropriate nonquadratic function, E denotes the 
expected value and v is & standardized Gaussian variable. The following three 
choices of G{ri) are conventionally used: 77'^, log (cosh (77)) or — exp (—77^/2). It 
should be noticed that in (5) the expected value of G{v) is a constant, its value 
only depending on the selected contrast function (e.g. E{Gi{y)) = 3). Hyvarinen 
recently proposed a fast iterative algorithm called FastIGA for the selection of 
the new base vectors of the linearly transformed space. The goodness of a new 
direction a is measured by the following function, where 77 is replaced with a ■ x 
in the negentropy approximant (5): 

TG{a) = {E{G{a-^))-E{G{v)})\ (6) 

As a matter of fact, FastIGA is an approximate Newton iteration procedure for 
the local optimization of the function tg{ci). Before running FastIGA, however, 
the raw input data X must first be preprocessed - by centering and whitening 
it. Between centering and whitening we may, perhaps, also apply a deviance 
normalization because the standardized data used as an input for the whitening 
sometimes improves the efficiency of the FastIGA algorithm. However, we should 
mention here there are many other iterative methods for performing Independent 
Gomponent Analysis. Some of these (similar to FastIGA) do require centering 
and whitening, while others do not. In general, experience has taught us that 
all these algorithms should converge faster on centered and whitened data, even 
those which do not really require it. 

Centering. An essential step is to shift the original sample set xi, . . . ,x„ with 
its mean fi = A{x}, to obtain data x^ = xi — /x, . . . , x(^ = x„ — /x, with a mean 
of zero. 

Whitening. The goal of this step is to transform the centered samples x^ , . . . , x(j 
via an orthogonal transformation Q into a space where the covariance matrix 
C = A{xx^} of the points xi = Q^xj^, . . . , x„ = Q^x(j is the unit matrix. Since 
the standard principal component analysis [10] transforms the covariance matrix 
into a diagonal form, the elements in the diagonal being the eigenvalues of the 
original covariance matrix G' = A{x'x'^}, it only remains to transform each 
diagonal element to one. It is readily seen that the sample covariance matrix 
G' is symmetric positive semidefinite, so the eigenvectors are orthogonal and 
the corresponding real eigenvalues are nonnegative. If we then further assume 
that the eigenpairs of G' are (ci, Ai), . . . , (cm, Am) and Ai > ... > Am, the 
transformation matrix Q will take the form [ciA]"^^^, . . . ,CkA^^^^]. If k is less 
than m a dimensionality reduction is employed. 
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Properties of the preprocessing stage. Firstly, after centering and whitening for 
every normalized a the mean of a • Xi, . . . , a • x„ is zero, and its variance is one. 
Actually we need this since (5) requires that p has a zero mean and variance of 
one hence, with the substitution rj = a ■ x, the projected data a ■ x must also 
have this property. Secondly, for any matrix W the covariance matrix Cw of the 
transformed points Wx±, . . . ,Wxn will remain a unit matrix if and only if W is 
orthogonal, since 



Cw = E{Wx{Wxy} = WE{xx^}W^ = WIW^ = WW^ (7) 

FastICA. After preprocessing, this method finds a new orthogonal base W for 
the preprocessed data, where the values of the non-Gaussianity measure tq for 
the base vectors are large^. The following pseudo-code give further details^: 



y« The input for this algorithm is the samiple matrix X and the 
y« nonlinear function G, while the output is the transformation 
y, matrix W . The first and second order derivatives of G are 
y, denoted by G auid G . is a symmetric 

y, decorrelation, where can be obtained from its 

% eigenvalue decomposition. If WiWj — EDE^ , then 
1 (WiW7)-i/2 is equal to ED~^/^E^. 
procedure FastICA (X, G) ; 

fi = E{x}; x' — X — fi; x = Q^x'; "/« centering & whitening 

Let Wo be a random m x m orthogonal matrix; 

Wo = {WoW^)-^^^Wo; 

1 = 0 ; 

While W has not converged; 

for j = 1 to TO 

let Sj be the jth raw vector of Wi; 

Wj = A{xG'(sj • x)} - A{G"(sj • x)}sj; 

end; 

i = i + l-, 

Wi = [wi,...,wp]^; 

W, = {W,W7 

do 

End procedure 



Transformation of test vectors. For an arbitrary test vector y € X the transfor- 
mation can be done using y* = WQ^{y — pi)- Here W denotes the orthogonal 
transformation matrix we obtained as the output from FastICA, while Q is the 
matrix obtained from whitening. 

^ Note that since the data remains whitened after an orthogonal transformation, ICA 
can be considered an extension of PC A. 

^ MatLab code available in [7]. 
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3 Independent Component Analysis with Kernels 



In this section we derive the kernel counterpart of FastICA. To this end, let the 
inner product be implicitly defined by the kernel function fc in ^ with associ- 
ated transformation cj). Now we need only extend nonlinearly the centering and 
whitening of the data, since after nonlinearizing {y — fF) we get data in T 
thus the nonlinearization of the iterative section becomes superfluous. 

Centering in T . We shift the data </<(xi), . . . , ^(x„) with its mean to obtain 
data 0'(xi), . . . , 0'(x„) with a mean of zero: 

1 " 

^'(xi) = (/)(xi) - /x'*’, . . . ,^'(x„) = (/)(x„) - = - V (/)(xi) (8) 

n 

Whitening in T . Much like that in linear ICA, the goal of this step is to find a 
transformation such that the covariance matrix 



t ^ ,^(xi)(^(xi)^ 
n 

2 = 1 



(9) 



of the sample ^(xi) = (xi), . . ., <^(xn) = Q^<l> (x„) is a unit matrix. 

As we saw earlier the column vectors of are the weighted eigenvectors of 

the positive semidefinite matrix C‘^. Because this eigen-problem is equivalent to 
determining the stationary points of the Rayleigh Quotient 



a^a 



a 



0 ^ a G iF, 



(10) 



this formula will be rearranged as an expression of dot products of the input 
data. Owing to the special form of we suppose that when we search for 
stationary points, a has the form 



n 

a= y^g,^(xi). 

2=1 



( 11 ) 



We may arrive at this assumption in various ways, e.g. by decomposing an ar- 
bitrary vector a into ai -I- a 2 , where ai is that component of a which falls in 
S'PAN(0(xi), . . . ,4>(xn)), while a 2 is the component perpendicular to it. Then 
from the derivation of (10) we see that a 2 • a 2 = 0 for the stationary points. 
The following formulas give the Rayleigh Quotient as a function of a. and 
fc(xi,xj): 



a^ C^a (Sr=i «t<^(xt)^) (ELi afc<^(xk)) 

(Er=i«i^(xt)^) (ELi«fc^(xk)) 



OL^^kka 

n 

ka 



( 12 ) 
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where 



Ktk = Er=i (</>(xk) - Er=i </*(xi))) = 

A:(xt,Xk) - (^Er=i (fc(xi,xk) + /c(xt,xi))) + Er=i ^(xi, xj) 

(13) 

From differentiating (12) with respect to a we see that the stationary points are 
the solution vectors of the general eigenvalue problem ^KKcx = XKcx, which 
in this case is obviously equivalent to the problem ^Ka — Xol. Moreover, since 
fc(xt,Xk) = /c(xk,xt) and^ ol^ ^K a. = > 0, the matrix is symmetric 

positive semidefinite and hence its eigenvectors are orthogonal and the corre- 
sponding real eigenvalues are non-negative. Let the k positive dominant eigen- 
values of be denoted by Ai > . . . > Afc > 0 and the corresponding normalized 
eigenvectors be , a^. Then the orthogonal matrix of the transformation 

can be calculated via: 



Q, := 



^1 ^ Yl «^i<^(xi), . . . , Afc ^ ^ a'i<^(xi) 






2 = 1 



(14) 



where the factors n and A ^ are needed to keep the column vectors of 
normalized 

Transformation of Test Vectors. Let y G ff be an arbitrary test vector. New 
features can be expressed by (p{y)* = WQ^{(j){y) — where denotes the 
matrix we obtained from whitening, while W denotes the orthogonal transfor- 
mation matrix we got as the output of Kernel-FastICA. Practically speaking, 
Kernel-FastICA = Kernel-Centering -|- Kernel- Whitening -|- iterative section of 
the original FastICA. Of course, the computation of <f>(y)* involves only dot 
products: 



<f(y)* = Wn 



Ai ^ ^ cj: iCi, . . . , Xk ^ ^ 



OL^Ci 



i=l 



i=l 



(15) 



= <^(xi)-<^(y) = fc(xi,y)- - (*(xi,Xj) -k fc(xj,y)) + F2 






t=i j=i 



(16) 



4 Experimental Results 

In these trials we wanted to see how well independent component analysis and its 
nonlinear counterpart could reduce the number of features and increase classifica- 
tion performance. Since automatic phoneme classification is of great importance 

® Here we temporarily disregard the constraint a 7^ 0. 

^ If we use the factors instead of A“^ in (14), then we obtain the Kernel Principal 

Component Analysis. 
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in the computer-assisted training of the speech & hearing handicapped, we chose 
phoneme classification as an area of investigation. 

We developed a program to help with speech training of the hearing impaired, 
where the intention was to support or replace their diminished auditory feedback 
with a visual one. In our initial experiments we focussed on the classification of 
vowels, as the learning of the vowels is the most challenging for the hearing- 
impaired. The software we designed assumes that the vowels are pronounced in 
isolation or in the form of two-syllable words, which is a conventional training 
strategy. Visual feedback is provided on a frame-by-frame basis in the form of 
flickering letters, their brightness being proportional to the vowels recognizer’s 
output (see Fig. 2.). 

Corpus. For training and testing purposes we recorded samples from 25 
speakers. The speech signals were recorded and stored at a sampling rate of 
22050 Hz in 16-bit quality. Each speakers uttered 59 two-syllable Hungarian 
words of the CVCVC form, where the consonants (C) are mostly unvoiced plo- 
sives so as to ease the detection of the vowels (V). The distribution of the 9 
vowels (long and short versions were not distinguished) is approximately uniform 
in the database. In the trials 20 speakers were used for training and 5 for testing. 

Feature Sets. The signals were processed in 10 ms frames, from which the log- 
energies of 24 critical-bands were extracted using FFT and triangular weight- 
ing [17]. In our early tests we only utilized the filter-bank log-energies from 
the most centered frame of the steady-state part of each vowel (“FBLE” set). 
Then we added the derivatives of these features to model the signal dynamics 
( “FBLE-|-Deriv” set). In another experiment we smoothed the feature trajecto- 
ries so as to remove the effects of short noises and disturbances (”FBLE Smooth” 
set). In yet another set of features we extended the log-energies with the gravity 
centers of four frequency bands which approximately corresponds to the possible 
values of the formants. These gravity centers provide a crude approximation of 
the formants ( “FBLE-|-Grav” set) [2]. 

Classifiers. In all the trials with Artificial Neural Nets (ANN) [3] the well- 
known three-layer feed-forward MLP networks were employed with the back- 
propagation learning rule. The number of hidden neurons was equal to the num- 
ber of features. In the Support Vector Machine (SVM) [21] experiments we always 
applied the Gaussian RBF kernel function {k^, r = 10). 

Transformations. In our tests with IGA and Kernel-IGA the eigenvectors be- 
longing to the 16 dominant eigenvalues were selected as basis vectors for the 
transformed space and the nonlinear function G{f]) was rj'^. In Kernel-IGA the 
kernel function was as before. Naturally when we applied a certain transforma- 
tion on the training set before learning, we used the same transformation on the 
test data during testing. 
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5 Results and Discussion 

Table 1 shows the recognition errors. Here the rows represent the four feature 
sets, while the columns correspond to the applied transformation and classifier 
combinations. 

On examining the results the first striking point is that although the trans- 
formations retained only 16 features, the classifiers could achieve the same or 
better scores. The reason for this is that ICA determines directions with high 
non-Gaussianity, which is proven to be a beneficial feature extraction strategy 
before the classification. As regards the various feature sets, we realized that the 
gravity center features and smoothing the trajectories both lead to a remarkable 
improvement in the results, while adding the derivatives in no way increased per- 
formance. Most likely, a clever combination of smoothing and taking derivatives 
(or RASTA filtering) could yield still better results. Another notable observation 
is that SVM consistently outperformed ANN by several percent. This can mostly 
be attributed to the fact that the SVM algorithm can deal with overfitting. The 
latter is a common problem in ANN training. 

Finally, with Kernel-ICA, we have to conclude that it is worthwhile contin- 
uing doing experiments with this type of nonlinearity. However, the problem of 
finding the best kernel function for the dot product extension or of choosing the 
best nonlinearity for the contrast function remains an open one at present. 



Table 1. Recognition errors for the vowel classification task. The numbers in paren- 
thesis correspond to the number of features. 





none 

ANN 


none 

SVM 


ICA 

ANN 

( 16 ) 


ICA 

SVM 

( 16 ) 


K-ICA 

ANN 

( 16 ) 


K-ICA 

SVM 

( 16 ) 


FBLE (24) 


26.71% 


22.70% 


25.65% 


23.84% 


23.19% 


22.20% 


FBLE-I-Deriv (48) 


25.82% 


24.01% 


28.62% 


26.81% 


24.67% 


23.35% 


FBLE-bGrav (32) 


24.01% 


22.03% 


23.68% 


23.35% 


20.88% 


20.06% 


FBLE Smooth (24) 


23.68% 


21.05% 


23.84% 


23.84% 


22.03% 


20.39% 



6 Conclusion 

In this paper we presented a new nonlinearized version of Independent Com- 
ponent Analysis using a kernel approach. Encouraged by [19] and [8], we could 
perform further extensions on Kernel Principal Component Analysis (KPCA), 
since ICA can be viewed as a modified PCA (centering and whitening) and an 
additional iterative process. But regardless of this we have demonstrated the su- 
periority of Kernel-ICA over its linear counterpart on the phoneme classification 
task. Unfortunately, feature extraction in kernel feature spaces is currently much 
slower, than the traditional linear version. Hence in the near future we will focus 
our effors on working with a sparse data representation scheme that is hoped 
will speed-up the computations somewhat. This seems to be a good direction to 
go in. 
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Abstract. We consider the following synchronous colouring game played 
on a simple connected graph with vertices coloured black or white. Dur- 
ing one step of the game, each vertex is recoloured according to the ma- 
jority of its neighbours. The variants of the model differ by the choice of 
a particular tie-breaking rule and possible rule for enforcing monotonic- 
ity. Two tie-breaking rules we consider are simple majority and strong 
majority, the first in case of a tie recolours the vertex black and the latter 
does not change the colour. The monotonicity-enforcing rule allows the 
voting only in white vertices, thus leaving all black vertices intact. This 
model is called irreversible. 

These synchronous dynamic systems have been extensively studied and 
have many applications in molecular biology, distributed systems mod- 
elling, etc. 

In this paper we give two results describing the behaviour of these sys- 
tems on trees. First we count the number of fixpoints of strong majority 
rule on complete binary trees to be asymptotically 4A • (2a)^ where N 
is the number of vertices and 0.7685 < a < 0.7686. 

The second result is an algorithm for testing whether a given configura- 
tion on an arbitrary tree evolves Into an all-black state under irreversible 
simple majority rule. The algorithm works in time 0{t log t) where t is 
the number of black vertices and uses labels of length 0(log A). 



1 Introduction 

Let us consider the following colouring game played on a simple connected graph. 
Initially, each vertex is assigned a colour from the set {black, white}. The game 
proceeds in synchronous rounds. In each round, every vertex checks the colours 
of all its neighbours and adopts the colour of the majority. In order to have a 
rigorous definition we must, however, supply a tie-breaking rule. In case of a tie 
there are generally four possibilities: the vertex may recolour itself white, black, 
may keep its colour or flip its colour. The choice of different tie-breaking rule 
leads to different behaviour of the system [17]. We consider the two most studied 
rules, so called strong and simple majority. When using simple majority rule, in 
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case of a tie the vertex recolours itself black. In strong majority rule, in case of 
a tie the vertex does not change its colour. 

This kind of systems appears in many different areas of research (e.g. gene 
expression networks [9], spin glass model [6], fault-local distributed mending 
[10], etc.) and they have been extensively studied. The most often addressed 
issues were the periodicity behaviour [7], [19], [20], [6], [13], [14], the number of 
fixpoints [1], and various questions concerning dynamos [11], [17], [18], [2], [4]. 

In this paper we address two important questions about this game. The first 
one is the number of different fixpoints (i.e. colourings which are stable under 
the majority rule). It is known (e.g. [6]) that the limit structures of this game are 
either fixpoints or have period 2. In [1] the number of fixpoints on rings was given. 
The study of this question was motivated by experiments in molecular biology 
which showed that even very large gene expression networks have only a small 
number of stable structures. In [1] it was shown that in a very simplified model 
of such networks with only the strong majority function on a ring topology, the 
number of fixpoints is an exponentially small fraction of all configurations. We 
give a similar result for the case of complete binary trees. 

The second question concerns a particular fixpoint, namely the monochro- 
matic (all-black) one. The configurations that form the basin of attraction of this 
fixpoint (i.e. the configurations for which iterative application of the majority 
rule leads to all-black configuration) are called dynamic monopolies (dynamos). 
The importance of this notion follows from the fact that if the game is viewed 
as a model of a behaviour of a faulty point-to-point system based on majority 
voting, dynamos correspond to the sets of initial faults that cause the entire 
system to fail. Dynamos have been introduced in [11] and since that time they 
have been intensively studied in the literature. The main line of the previous 
research has been oriented on the size of dynamos for various topologies (see e.g. 
survey paper [16]). 

In this paper we address the issue of testing whether a given configuration on 
an arbitrary tree is an irreversible simple majority dynamo when only the iden- 
tities of the black nodes are given. We use the notion of a labelling scheme ([15]) 
where each vertex is given a (short) label in a preprocessing phase. The input 
to the algorithm are the labels of black vertices. We give an algorithm which 
decides whether the given set of black vertices is a dynamo in time O(tlogt) 
with 0(log fV)-bit labels where t is the number of black vertices and N is the 
number of all vertices. 

The paper is organised as follows. In Section 2 we give the asymptotic ratio 
of fixpoints of the strong majority rule in complete binary trees. In Section 3 we 
present a testing algorithm for simple irreversible dynamos in arbitrary trees. 



2 The Number of Fixpoints on Complete Binary Trees 

Given a dynamic system, the number of fixpoints (i.e. configurations that are 
stable under the system’s action) is one of the first and most important questions 
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to be asked. In this section we consider the strong majority rule acting on a 
complete binary tree. 

Agur et al [1] studied the number of fixpoints on rings under the strong 
majority rule. The asymptotic ratio of fixpoints (i.e. the ratio of fixpoints to all 
configurations) is , where N is the number of vertices and c = (1 + -\/5)/4. 

This result shows that only a small fraction of all configurations are fixpoints. 
In [1], this fact is compared with experimental results about gene expression 
networks. The genetic code of a cell can be imagined as a set of genes which are 
either active (expressed) or passive. Each gene has associated a boolean function 
which decides if this particular gene should be activated based on the status of 
a number of other genes. Experiments and computer simulations showed that 
although the number of possible configurations of a network with size comparable 
to the genetic code of a cell is very large, there is only a small number of fixpoints 
with large attractors. In this section we ask the question whether this property 
is preserved also when there is no feedback, i.e. the graph is acyclic. 

We give the asymptotic ratio of fixpoints of strong majority rule on complete 
binary trees of height n. 

Theorem 1. Let fn he the number of different fixpoints of strong majority rule 
on a complete binary tree of height n. Then the asymptotic ratio of fixpoints 
is lim„H_>oo fn!^^ = 4fV • , where N = 2'^ is the number of vertices and 

0.7685 < a < 0.7686. 

Proof. Let t(i, n) be the number of different fixpoints on a tree of height n such 
that the root has value i. As we are using the strong majority rule, it holds 
t(0,n) = t(l,n). Let us denote t{n) = t(0,n). Then it holds /„ = 2t{n). 

Consider a fixpoint configuration on a tree of height n. W.l.o.g. let the root 
V be black. There are two possible cases: either both its children u, w are black 
or one of them is black and the other is white (note that in the case of tie the 
colour remains unchanged). In the former case, each of u and w is black and 
has one black neighbour v. It follows that it is necessary and sufficient that the 
configuration restricted to either of the subtrees rooted at u and w is a fixpoint. 
In the later case, let u be black and w white. Now u is in the same situation as 
before, however w needs both children to be white. Hence each of w's children is 
white and has one white neighbour; again a situation similar to the above. This 
analysis leads us to the recurrence: 

t(0) = l,t(l) = l 

tin) = t{n — 1)^ + 2t{n — l)t{n — 2ff 



which can be transformed by substitution g(n) = t{n)/t(n — iff to the form 

9(1) = 1 
9 (^) = 1 + 

Looking for a solution of the form q{n) = s{n)/s{n — 1) we get s(n) = 
2 „+i follows that 



i(n) = n 






s(i - 1) 



2"+i + (_i)^ 



n 



2 = 1 
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Now let r(n) = ^ * ln(2*+^ + (—1)*). Clearly 



t{n) 



2„+i 



^r(n) 



As can be easily seen, the sequence {r(n)}“ converges and we are interested 
in lim„^_>oo r(n). Splitting r(n) to odd and even z’s we get 



in)= 



ln(2 • 4* + 1) 
4* 



LtJ 

2 E 



ln(4* - 1) 
4* 



Both of the sums converge, so we can take the limit lim„^_>oo r(n) = r = 

+2r^^^. Let us consider the two sums sep- 
arately. Each of them is of the form i a > 1, & > —1. As the 

inner function is continuous and decreasing in (2, oo), can be approximated 

by 



E 



ln(a -4^ + b) 
4i 



< r 



(i) 



^E 



ln(a -A + b) 
4* 



ln(a • 4^ -I- 6) 



dx, p>2 



i=l i=l 

It holds 4 := / + !f- + f ) and lim, 

Ix = 0. An easy computation for p = 10 reveals, that 



0.87867 < < 0.87869 

0.53990 < r(2) < 0.53992 

1.95847 < r < 1.95853 



The asymptotic ratio of fixpoints is 
2t{n) 



lim 

ni— >-oo 2"^ 



= lim (4-2"-h2(-l)”) 






41V 



where 0.7685 < a < 0.7686. 



3 Dynamo Testing on Trees 

Let us now focus our attention to another question concerning the colouring 
game: what is the complexity of deciding, given a configuration C, whether it 
belongs to the attractor of a particular, namely all-black, fixpoint (i.e. whether 
C is a dynamo) . The motivation behind this problem comes from the modelling 
of faulty networks. The configuration may be viewed as the subset of vertices 
that are coloured black, which represent faulty nodes. A node may become faulty 
if the majority of its neighbours is faulty. This steams from the fact that the 
state of a node depends on the messages it has received from its neighbours. The 
(Byzantine) faulty majority of its neighbours can hence control its behaviour. 

In order to take into account the fact that the faulty nodes remain faulty, 
a modification of the majority rule called irreversible majority has been widely 
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studied [3,4,5,12]. Using this rule, the (simple or strong) majority voting is ap- 
plied only in white vertices. The black ones remain black throughout the whole 
execution. 

The configurations in the attractor of the all-black fixpoint are called dy- 
namos: 

Definition 1. A set of black vertices M is called a (irreversible) dynamo if and 
only if the (irreversible) majority computation starting from M colours black the 
whole graph. 

In our view of the black vertices as faulty processors, a dynamo represents a 
state of the system which eventually leads to a complete crash. Therefore it is 
important to be able to test whether the given configuration forms a dynamo. 

The problem we study is the following: 

Definition 2. Testing is a problem to decide, given a graph G and a set M 
of black vertices, whether M is an irreversible dynamo. 

It is easy to see that since the black vertices are never recoloured white, the 
sequential simulation of the whole computation can be done in 0(m) time. 

Observation 1. There is an 0{m) algorithm for Testing in arbitrary graphs, 
where m is the number of edges. 

Proof. Consider the edge boundary d{M) of M. During the computation of the 
system each edge is added and withdrawn from d{M) at most once. □ 

Corollary 1. The complexity o/ T esting in arbitrary graphs is 0{m). 

It is, however, often desirable to measure the complexity of Testing not in 
terms of the size of the entire graph but in terms of the number of the black 
nodes. This corresponds to a situation when the only information about the 
state of the system consists of some identifiers of the faulty nodes (the actual 
topology is unknown except for the fact that it is acyclic) . We want an algorithm 
which, given the identifiers of black vertices, decides whether the current state is 
a dynamo. As we consider the topology to be fixed, we allow some preprocessing 
to choose the vertex labels. In order to achieve this goal we use the notion of 
labelling schemes. 

Informally, a labelling scheme allows a preprocessing phase in which each 
vertex of a graph receives a (short) label. The aim is, given labels of some 
vertices, to compute the value of a function defined over subsets of vertices of 
the graph. The previous research involves mainly distance labelling schemes [8] 
where the function to be computed is the distance of two vertices (and some 
default value for all other subsets of vertices), and labelling schemes for least 
common ancestor and Steiner tree on trees [15]. 

Another examples of labelling schemes have been used in various areas of 
applications, e.g. compact routing using ILS [22,23,21]. 

Definition 3. An f{N) -labelling scheme is a polynomially computable function 
that assigns to each vertex of an N-node graph G a binary string of length at 
most f{N). 
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The local version of the Testing problem, in which only the labels of faulty 
nodes are known can be described as follows: 

Definition 4. Let L he a labelling scheme. £-Testing is a problem to tell, given 
a set of t labels of some nodes of G, whether these nodes form an irreversible 
dynamo in G. 

/(A^) -Testing is a problem to devise an f{N) -labelling scheme and an al- 
gorithm for /1-Testing. 

Next, we give an efficient 0(log A^)-Testing algorithm on trees. In the phase 
of assigning labels, we shall use the following lemma: 

Lemma 1. Consider a string a, |o!| = m. Then for any n > 2 and for any k < n 
there are strings /3i,...,/3„ such that for any choice of k strings f3i^ it is 

possible to reconstruct a. Moreover, for each j3i it holds \Pi\ < [m (l — • 

Proof. Consider an n x to matrix M of zeroes and ones. Let the z-th column 
contain all ones except for a consecutive block of A: — 1 zeroes starting from row 
H-(z — l)(fc— 1). Let Pi be a sequence of those bits aj from a for which Mij = 1. 
Because in each column of M there are fc — 1 zeroes, in every k strings there is 
for each bit at least one Pi containing that bit. Moreover, as the number of ones 
in every two rows differs at most by 1, the maximal number of ones in one row 

□ 

The algorithm in the following theorem uses O(logfV) bit labels in order to 
test the given configuration in time Oft log f) where t is the number of black ver- 
tices. The running time of this algorithm on configurations of size u^N/ log N) 
is slower than the trivial testing algorithm equipped with full topology informa- 
tion. This is due to the fact that the input may contain the labels of black ver- 
tices in any order which requires a sorting phase. However on configuration with 
0{N/ log N) black vertices, the running time of this algorithm is linear. Please 
note that as there are trees for which some configurations of size 0{N/ log N) 
are dynamos and some are not, it is not possible to solve the problem based on 
the number of black vertices only. 

Theorem 2. Let the unit-cost operations be operations onlogN-hit words. Then 
there exists an 0{tlogt)-time algorithm for irreversible 0(log iV) -Testing in 
arbitrary trees using simple (strong) majority. 

Proof. Given an fV-vertex tree T, we show how to construct OflogN) bits long 
labels Ly for each vertex v and how to test for a dynamo in time Oftlogt) given 
t labels. 

First consider the case when every vertex z; in T has degy yf 2. Suppose 
there is an unique identifier IDy for each vertex v, such that \IDy \ = OflogN). 
Choose an arbitrary vertex to be the root. In each vertex choose a fixed or- 
dering of its children. By “level i” we mean the set of vertices having dis- 
tance i from the root. To each vertex v assign a local label Lf such that 
Ly — {^L Dy , degy , L , degpayyjit^ ,levely , Sy ,ly , CyP where Sy denotes the 
v’s number in the children’s ordering of its parent, ly denotes the number of 
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vertices with degree more than one in the same level as v, and Cv is true if all 
children of v are leaves. For the case of root, the values IDparenUoat IDroot 
are equal. Clearly, L'^ can be constructed in 0{N log N) time. The labels Ly are 
constructed recursively as follows. For the root it is Trooi = ^root- consider 
a vertex v with label Ly such that degy = d + 1. If d = 2 let /3i be the first 
half of Ly and P 2 the second half. If d > 2 choose according to Lemma 

I in such a way that it is possible to reconstruct Ly from arbitrary [ P’s. 
Let w be the z-th son of v in v’s ordering. Then Lyj = {L'y,,Pi). The length of 
each Pi is at most i.e. \Pi\ < |"|L^||]. Hence the length of a label is 

bounded from above by 6L + 6, where L is the maximal length of a local label, 
i.e. L = O(logiV). 

The testing algorithm proceeds as follows. Given t labels, sort them according 
to the levely. Start with the bottom-most level. For each initially black vertex u, 
check if its local Cy is true; if not, the configuration is not a dynamo. Count the 
number of initially black vertices with degree more than one. Using the value ly, 
if there are some vertices with degree more than one in the bottom-most level 
which are not initially black, the configuration is not a dynamo. Now suppose 
that level I has been already checked. The level I — 1 is checked as follows. 
Construct a list C of IDs of all parents of all black vertices from level 1. For 
each of them, the information about their ID, degree and the number of black 
children is available from the labels of vertices in level 1. For each initially black 
vertex in level I — 1 not in C check its Cy value. Assign colour to each vertex v G C 
as follows: if v has at least [{deg{v) + 1)/2J (for strong majority) or \deg{v)/2~\ 
(for simple majority) black children (or is initially black) colour v black. If v has 
exactly [deg{v)/2\ (for strong majority) or \deg{v)/2~\ — 1 (for simple majority) 
black children colour v gray. Otherwise v is white and the configuration is not 
a dynamo. For all u G £ that have been coloured black, construct their Ly from 
labels of their children. At this stage, there must be at least one black vertex 
in level I — 1. Extract the ly from its label. Using this value and the number of 
black and gray vertices check the existence of a white vertex at level I — 1. If the 
root is coloured black, the configuration is a dynamo. 

First note that in the algorithm a vertex is coloured black only if it becomes 
black sometimes during the computation. Similarly if a vertex is coloured white 
by the algorithm it will never be coloured black during the computation. The gray 
vertices may become black in the computation only if there is a gray ancestor- 
path ending in a black vertex. Then, the whole gray subtree is coloured black. 

In each level all vertices are considered: either they are in C, or they are 
white non-leaf and are checked from the ly value, or they are white leaves and 
are checked in the next level from their parent. Hence the algorithm is correct. 

Since the explicitly constructed black vertices form a tree with degree yf 2 
and the number of gray vertices is at most linear in the number of black ones, 
the dominating term in the time complexity comes from the initial sorting. 

Now consider the case when some vertices may have degree 2. In the phase of 
assigning labels, the following changes will be made. First, before a root is chosen. 
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every path consisting of vertices of degree 2 will be replaced by a “supernode” . 
This “supernode” will be treated differently in simple and strong models. 

In simple majority, each “supernode” act as one node, i.e. all nodes from 
a supernode will have the same label. This is because in the computation the 
whole chain recolours black exactly if one of the member vertices is black. So 
now there are only two cases of vertices with degree 2: the root or a vertex with 
one parent and one child, both of degree yf 2. For the case of root the algorithm 
works properly. Consider a vertex u with the parent v and a child w. As vertex 
u can be coloured by either w or v, it is sufficient that the parent information 
and the additional bits used to reconstruct the parent’s label are set in both u 
and w to “point to” v. Moreover, L'^ may contain additional information about 
both w's and m’s levels. 

In the strong majority model, nodes of each supernode have additional iden- 
tifiers and each supernode is checked if it can be coloured, i.e. all nodes in the 
supernode except for first and last must be coloured black. The situation is then 
similar to the case above with the difference that the first and last nodes of a 
supernode are treated separately. □ 

4 Conclusion 

We have addressed two questions concerning the iterated majority voting system. 
The first one concerns the number of fixpoints of the iterated majority voting on 
a complete binary tree. The result compares to similar results on rings from [1]. 
An interesting question for the future research may be to show how the number 
of fixpoints varies over different topologies with different models (simple/strong 
reversible/irreversible majority) . 

The other question we asked is the complexity of the dynamo testing in trees. 
We showed that it is possible to test whether a configuration is a dynamo in time 
0{tlogt) where t is the size of the configuration (i.e. the number of its black 
vertices) using 0(log A^)-bit labels. It might be interesting to have other examples 
of topologies for which an 0(log A^)-Testing algorithm with time complexity 
f{N) exists such that f{d) = o{m) where d is the size of minimal dynamo and 
m is the number of edges. 
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Abstract. In the context of quantum computing, reversible computa- 
tions play an important role. In this paper the model of the reversible 
pebble game introduced by Bennett is considered. Reversible pebble 
game is an abstraction of a reversible computation, that allows to ex- 
amine the space and time complexity for various classes of problems. 
We present techniques for proving lower and upper bounds on time and 
space complexity. Using these techniques we show a partial lower bound 
on time for optimal space (time for optimal space is not o(nlgn)) and 
a time-space tradeoff (space O(-yn) for time 2^n) for a chain of length 
n. Further, we show a tight optimal space bound (h + 0(lg* h)) for a bi- 
nary tree of height h and we discuss space complexity for a butterfly. By 
these results we give an evidence, that for reversible computations more 
resources are needed with respect to standard irreversible computations. 



1 Introduction 

Standard pebble game was introduced as a graph-theoretic model, that enables 
to analyse time-space complexity of deterministic computations. In this model, 
values to be computed are represented by vertices of a directed acyclic graph. 
An edge from a vertex a to a vertex b represents the fact, that for computing 
the value a, the value b has to be already known. Computation is modelled by 
laying and removing pebbles on/from the vertices. Pebbles represent the memory 
locations. A pebble laying on a certain vertex represents the fact that the value 
of this vertex is already computed and stored in the memory. 

The importance of the pebble game is in the following two step paradigm: 

1. the inherent structure of studied problem forms the class of acyclic graphs; 
investigate time-space complexity of pebbling this class of graphs; 

2. apply the obtained time-space results to create a time efficient space re- 
stricted computation of the original studied problem. 

Various modifications of this game were studied in connection with differ- 
ent models of computations (e.g. pebble game with black and white pebbles for 
nondeterministic computations, two person pebble game for alternation com- 
putations, pebble game with red and blue pebbles for input-output complexity 
* Supported in part by grant from VEGA 1/7155/20. 
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analysis, pebble game with labels for database serializability testing, etc., see 

[7]). 

In connection with quantum computing, the model of reversible computations 
is very interesting. As the basic laws of quantum physics are reversible, also the 
quantum computation has to be reversible. That means, that each state of the 
computation has to uniquely define both the following and the preceding state 
of the computation. 

Another motivation to examine the model of reversible computation follows 
from the fact, that reversible operations are not known to require any heat 
dissipation. With continuing miniaturisation of computing devices, reduction of 
the energy dissipation becomes very important. Both these reasons for studying 
reversible computations are mentioned in [9], [2], [5] and [4]. 

A modification of the standard pebble game for modelling reversible compu- 
tations is the reversible pebble game. Reversible pebble game enables to analyse 
time and space complexity and time-space trade-offs of reversible computations. 

In this paper, three basic classes of dags are considered: the chain topology, 
the complete binary tree topology and the butterfly topology. These topologies 
represent the structure of the most common problems. 

It is evident, that minimal space complexity for standard pebble game on 
chain topology is 0(1), minimal time complexity is 0(n) and minimal space and 
time complexities can be achieved simultaneously. For reversible pebble game, 
in [4] was proved minimal space complexity on the chain topology in the form 
0(lg n) and upper bound on time complexity for optimal space complexity in 
the form 0(n^®^). In [1] it was introduced a pebbling strategy, that yields an 
upper bound of time-space tradeoff for reversible pebble game on chain in the 

form: space O(^^lgn) versus time 0(n ). 

We show that optimal time for optimal space complexity cannot be o(nlgn). 
Further, we show the upper bound on the time-space tradeoff for reversible 
pebble game on chain in the form: space 0{f/n) versus time 2^n. 

Minimal space complexity ft. -I- 1 for standard pebble game on a complete 
binary tree of height ft was proved in [6]. We show a tight space bound for 
reversible pebble game on a complete binary tree in the form ft -|- 0(lg* h). 
These results give an evidence, that more resources are needed for reversible 
computation in comparison with irreversible computation. 



2 Preliminaries 

Reversible Pebble Game is played on directed acyclic graphs. Let G be a dag. 
A configuration on G is a, set of its vertices covered by pebbles. Let G be a 
configuration, the formula G(u) = 1 denotes the fact that the vertex v is covered 
by a pebble. Analogically, C{v) = 0 denotes that the vertex v is uncovered. We 
denote the number of pebbles used in a configuration G as #(G). An empty 
configuration on G is denoted as E(G). Empty configuration is a configuration 
without pebbles. The rules of Reversible pebble game are the following: 
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R1 A pebble can be laid on a vertex v if and only if all direct predecessors of 

the vertex v are covered by pebbles. 

R2 A pebble can be removed from a vertex v if and only if all direct predecessors 

of the vertex v are covered by pebbles. 

Reversible pebble game differs from standard pebble game in rule R2 - in 
standard pebble game, pebbles can be removed from any vertex at any time. 

An ordered pair of configurations on dag G, such that the second one follows 
from the first one according to these rules, is called a transition. 

For our purposes, a transition can be also a pair of two identical configura- 
tions. A nontrivial transition is a transition not formed by identical configura- 
tions. 

Important property of a transition in a reversible pebble game is symmetry. 
From the rules of the game follows, that if (Ci, C 2 ) forms a transition, then also 
(C 2 ,C'i) forms a transition. 

A computation on graph G is a sequence of configurations on G such that 
each successive pair forms a transition. Let C be a computation, C{i) denotes 
the i-th configuration in the computation C. A computation C is a complete 
computation, if and only if the first and the last configurations of C are empty 
(e.g. #(C(1)) = #(C(n)) = 0, where n is the length of the computation C) and 
for each vertex v there exists a configuration G in C, such that v is covered in 
G. 

We shall be interested in space and time complexities of a computation C. 
Space of a computation C (denoted as S(C)) is the number of pebbles needed to 
perform the computation - that is the maximum number of pebbles used over all 
configurations of C. Time of a computation C (denoted as T(C)) is the number 
of nontrivial transitions in C. 

The minimal space of the reversible pebble game on the dag G (denoted as 
Smin(G')) is the minimum of S(C) over all complete computations C on G. The 
time T(G, s) of the reversible pebble game on the dag G with at most s pebbles is 
the minimum of T(C) over all complete computations C on G such that S(C) < s. 

Let G he a class of dags. Then the minimal space function Smin('«-) of a class 
G is the maximum of Smin(G) over all dags in the subclass Gn- The time function 
T(n, s) is the maximum of T(G, s) over all dags G in the subclass Gn- 

2.1 Operations on Computations 

For proving upper and lower bounds on time and space complexities of the re- 
versible pebble game, it is useful to manipulate formally with reversible compu- 
tations. We will use an algebraic way to describe computations. An advantage of 
this approach is in high precision of the description. In this section we introduce 
some operations for constructing and modifying computations. 

For changing state of a particular vertex in a configuration, we use the oper- 
ation Put. 

Definition 1. Let G = {V,E) be a dag, G be a configuration on G. Let v € V 
and h € {0, 1}. Then Put(G,v, h) is a configuration on G defined as follows: 
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— Put(C, V, h){u) = C{u), if u G V and u ^ v; 

— Put(C, V, h){u) = h, if u G V and u = v. 

An important property of reversible computations is the following one: Let G 
be a dag, G' be a subgraph of G and C be a computation on G. If we remove all 
vertices not in G' from all configurations of C, we obtain a reversible computation 
on G' . The correctness of such construction is clear - we cannot violate any 
rule of reversible pebble game by removing a vertex from all configurations of 
a computation. Another important fact is, that removing some configurations 
from the beginning and the end of a reversible computation does not violate a 
property of a reversible computation, too. 

Also, we can define an operator for a “restriction” of a computation: 

Definition 2. Let G = (V,E) be a dag, V C V. Let C be a computation of the 
length n on G. A Restriction C = Rst(C, z, j, P') of the computation C to an 
interval {i . . .j} ^ ^ j ^ n) and to a subgraph G' = (P', E fl (P' x P')) is 

a computation C of the length j — i + 1 on G' defined as follows: 

(V/c G {1 . . . j - z + l})(Vu G V')C'{k){v) =C{i + k- l)(z;) 

We use a notation Rst(C,z,j) when no vertices should be removed (e.g. Rst(C,z, 
j) = Rst(C, z, j, P) for the graph G = (P, E) ). 

From the symmetry of the rules of the reversible pebble game follows, that 
reversing a reversible computation does not violate the reversible computation 
property. We can therefore define an operator Rev. 

Definition 3. LetC be a computation on G of the length n. Then the reverse of 
the computation C ( denoted as Rev(C ) ) is a computation on G defined as follows: 

(Vz G {1 . . . n}) Rev(C)(z) = C(n + 1 — z) 

Now we introduce operations, that are inverse to restriction in some sense. 

Definition 4. Let C\ and C 2 are computations on a dag G, let C\ and C 2 have 
length rzi and U 2 respectively. Let Ci(rzi) and C 2 (l) form a transition. Then the 
join of computations Gi and C 2 (denoted as C\ + C 2 ) is a computation on G of 
length n\ + U 2 defined as follows: 

- (Cl +C2)(z) = Ci(z), ifi<ni 

- (Cl +C2)(z) = C2(z - rzi), z/z > rzi 

It is clear, that this definition is correct. Configurations (Ci +C 2 )(rzi) and 
(Cl + C 2 )(ni + 1) form a transition by assumption. All other successive pairs of 
configurations form transitions, because Ci and C 2 are computations. 

Let C be a configuration on a dag G. Then we can look at a configuration 
G also as at a computation of length 1, so that C(l) = C. Therefore we can also 
join a computation with a configuration and vice versa. 

The join of two computations is an inverse operation to restriction by re- 
moving configurations. Now we define an inverse operation to the restriction 
performed by removing vertices. 
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Definition 5. Let G = {V,E) be a dag, Vi C V, V 2 C V, Vi n Rs = 0- Let C 
he a configuration on the graph {V 2 ,EC\ (V 2 x ^ 2 ))- Let {(w,w)|f G V\ /\ w & 
V 2 /\C{w) = 0A('u;,t') G E} = 0. LetC he a computation of length n on the graph 
{Vi, EC\{Vi X Vi)). The computation C merged with the configuration C (denoted 
as C ■ C ) is a computation on the graph (Vi U V 2 , E fi ((Vi U V 2 ) x (Vi U V 2 ))) of 
length n defined as follows: 

- {C-C){t){v)=C{t){v), ifvGV, 

- {C ■ C){i){v) = C{v), ifvGV2 

This definition is clearly correct. By adding the same configuration to all 
configurations of some computation C, it is only one way to violate the rules 
of the reversible pebble game: if some of the added direct predecessors of a 
vertex, which the pebble is laid on or removed from, are not pebbled. But this 
is prohibited by the assumption of definition. 

Any computation on a graph G can be applied on any graph G' that is 
isomorphic with G. The application of a computation can be defined as follows: 

Definition 6. Let C he a computation of length n on a dag G and G' he a 
dag isomorphic with G. Let (p is the isomorphism between G' and G. Then a 
computation C applied to the graph G' (denoted as C\G' ) is a computation on 
G' of length n such that (C|G")(t)(u) = C{i){(p{v)) for all 1 < i < n and for all 
vertices v of G' . 

3 Chain Topology 

The simplest topology for a pebble game is a chain. Chain with n vertices 
(denoted as Gh{n)) is a dag Gh{n) = (V,E), where V = {l...n} and E = 
{(z — l,i)\i £ {2 . . . n}}. This topology is an abstraction of a simple straightfor- 
ward computation, where the result of step n -I- 1 can be computed only from 
the result of step n. 

In this section we discuss optimal space complexity for a reversible pebble 
game on the chain topology - the minimal space function Smin(?^) for Ch, where 
the subclass Ghn contains only a chain Ch{n). We will discuss also partial lower 
and upper bounds for optimal time and space complexities - the time func- 
tion T(n, Smin(n)) and the upper bound of the time-space tradeoff for the chain 
topology. 



3.1 Optimal Space for the Chain Topology 

For determining space complexity of the reversible pebble game on the chain 
topology we will examine the maximum length of the chain, that can be peb- 
bled by p pebbles. We denote this length as S~^{p). It holds, that S“^(p) = 
max{m|(3C G Cch(m)) S(C) < p}, where Cc/i(m) is the set of all computations 
on the graph Gh(jn). 
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Reversible pebble game on the chain topology was studied in connection 
with reversible simulation of irreversible computation. C. H. Bennett suggested 
in [1] a pebbling strategy, whose special case has space complexity 0(lg n). Space 
optimality of this algorithm was proved in [5] and [4] . This result is formulated 
in following theorem. 

Theorem 1. It holds that S~^ (p) = 2^ — 1. Therefore for minimal space function 
of chain topology Smin(?t^) it holds 

Smin(n) = 6>(lgn) 

3.2 Optimal Time and Space for the Chain Topology 

In this section we present upper and partial lower bounds on time for space 
optimal reversible pebble game played on a chain topology. 

We will use two auxiliary lemmas. Their proofs are not difficult and are left 
out due to space reasons. 

Lemma 1. Let C he a complete computation of length I on Ch{n), S(C) = 
Smin(?T-)> T(C) = Smin(?^))- Let i = min{t|f G {1 . . . /} A C{i)(n) = 1}. Then 

it holds that 

T(Rst(C, 1, t)) = T(Rst(C, i,l)) = 

Lemma 2. LetC be a complete computation on Ch{S~^{p + 1)) such that S(C) = 
p+ 1. It holds that 

max{min{j|j G {1 ... n} A C{i){j) = l}|z G {1 .../}} = S“^(p) + 1 

Now we prove the upper and partial lower bound on time for space optimal 
pebble game: 

Theorem 2. T{S~\p + l),p + 1) > 2S~^{p) + 2 + 2T{S~^{p),p) 

Proof. Let C be a time optimal complete computation on Ch{S~^{p + 1)), such 
that S(C) = p+ 1. Let I be the length of C. Clearly T(C) = T(S“^(p+ l),p+ 1). 
We prove, that T(C) > 2S“^(p) + 2 + 2T(S“^(p),p) holds. 

Let n = S“^(p), Gi = {{n + 1}, 0) and G 2 be a graph obtained from Gh{n) 
by renaming vertices to n + 2 . . . 2n + l = S“^(p+1). Let i = min{z|z G {1 ... 1} A 
C{i){2n + 1) = 1}. By Lemma 1 it holds T(Rst(C, l,z)) = |T(C). From Lemma 
2 follows, that Rst(C, k,k, {I . . .n + 1}) yf E(C/i(n)) • E(Gi)) for all k and that 
there exists j such that Rst(C, j, j, {1 . . . n+1}) = E(C/i(n))-Put(E(Gi), n+1, 1). 
W.l.o.g. we can assume j < i (otherwise we can replace C by Rev(C)). Let fc be a 
configuration such that C(fc — l)(n + l) = 0 and (V< 7 )(fc < q< j)C{q){n+l) = 1). 
Clearly C{k — l)(n) = C{k){n) = 1. 

Now consider the computation C 2 = Rst(C, 1, k— 1, {1 . . . n}) -E(Gi) -E(G 2 ) + 
Rst(C, fc, J, {1 . . . n}) ■ Put(E(Gi), n+ 1, 1) • E(G 2 ) +Rst(C, 1, z, {n + 2 . . . 2n+ 1}) • 
E(G/i(rz)) • Put(E(Gi),rz + 1,1). Clearly S(C 2 ) < S(C). Also, C 2 + Rev(C 2 ) is 
complete on Gh{2n + 1) and T(C 2 ) < T(Rst(C, 1, z)). 
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It is clear, that T(Rst(C, 1, /c — 1, {1 . . . n})) > n - we cannot pebble n vertices 
with time less than n. Rev(Rst(C, k,j,{l . . . n}))+Rst(C, k,j,{l . . . n}) is a space 
optimal complete computation on Ch{n), therefore T(Rst(C, fc, j, {1 . . . n})) > 
|T(n, Smin(»^)) = |T(S“^(p),p). Analogically, T(Rst(C, 1, i, {n+2 ... 2n+l})) > 
|T(n,Smin(n)) = |t(S"^(p),p). 

From these inequalities follows, that T(Rst(C, 1, i)) > n+l+2iT(n, Smin(?^))- 
Therefore T(C) > 2n + 2 + 2T(n, Smin('«-)) = 2S“^(p) + 2 + 2T(S“^(p),p). □ 

Corollary 1. T(n, Smin(^^)) = T(n,Smin(n)) o(nlgn) 

Proof. The upper bound was presented in [4]. By solving recurrent inequality 
proved in preceding theorem, we obtain that T(n, Smin(?^)) = I2(nlgn) for n = 
2P — 1. Since this function is a restriction of T(n, Sniin(f^)) for integer n, function 
T{n, Smin{n)) cannot be o(nlgn). □ 



3.3 Upper Bound on Time-Space Tradeoff for Chain Topology 

In the previous section it was analysed time complexity of reversible pebbling 
for space optimal computations. Now we discuss the time complexity for com- 
putations, that are not space optimal. 

It is obvious, that for any complete computation C on Ch{n) it holds T(C) > 
2n, because each vertex has to be at least one time pebbled and at least one 
time unpebbled. It is also easy to see, that space of such computation is exactly 
n. 

Now we will analyse space complexity of complete computations on Ch(n) 
that are running in time at most c-n. Let S“^(c, k) = max{n|3C G ^ch{n) S(C) < 
fcAT(C) < cn}, where <Sch(n) is the set of all complete computations on Ch{n). 

Theorem 3. For a fixed k, it holds S“^(2^,p) = Q{p^). 

Proof. We prove a statement S“^(2^,p) > c{k)p^ by induction on k and p. Let 
c(l) = 1. The base case S“^(2^,p) > p holds trivially. (It is easy to make a 
complete computation C on Ch{p) satisfying S(C) = p and T(C) = 2p.) 

Assume by the induction hypothesis that it holds (Vfc' < fc)(Vp')S“^(2^ ,p') > 
c{k')p'^ and (Vp' < p)S“^(2^,p') > c{k)p'^. We prove, that S“^(2'^,p) > c{k)p^ 
holds. 

Let Cl be a complete computation on Ch(S“^(2*“^,p — 1)), S(Ci) < p — 1, 
T(Ci) < 2^“^S“^(2^“^,p— 1). Denote the length of Ci by l\. Clearly there exists 
m such that Ci(m)(S“^(2^“^,p — 1)) = 1. 

Let C2 be a complete computation on C'/i(S“^(2^,p — 1)), S(C2) < p — 1, 

T(C2) < 2'=S-l(2^p- 1). 

Let Gi = ({S“^(2^“^,p — 1) -I- 1},0). Let G 2 is a graph obtained from 
Ch{S~^{2^ ,p — 1)) by renaming its vertices to S“^(2^“^,p— 1)-|-2 , . . . , 
p — 1)-|-1-|-S“^(2'',p — 1). Now assume the following computation C3 = Rst(Ci, 1, 
m) ■ E(Gi) • E(G2) -k Rst(Ci, m, h) • Put(E(Gi), S-^(2'=-i,p - 1) -k 1, 1) • E(G2) -k 
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(C2|G2)'E(C'/i(S-i(2'=-i,p-l)))-Put(E(Gi),S-H2'=-\p-l) + l,l)+Rev(Rst( 
Cl, m,;i))-Put(E(Gi), 8-1(2'=-!, p-l) + l,l)-E(G2)+Rev(Rst(Ci,l,m))-E(Gi)- 
E(G2). 

Clearly C3 is a complete computation on Ch{S~^ {2^~^ — 1) + 1 + S-i(2'=, 
p-l)) satisfying S(C3) < p and T(C3) < 2T(Ci) + 2 + T(C2) < 2'=S-i(2'=-i,p- 
1) + 2 + 2'=S-i(2'=,p- 1) < 2'=(S-i(2'=-i,p- 1) + l + S-l(2^p- 1)). Therefore 
S-l(2^p) > S-i(2'=-i,p- l) + l + S-l(2^p- 1). 

By induction hypothesis we have S-i(2'=,p) > c{k — l)(p — l)'=-i + c{k){p — 
1)'=. For a suitable value of c{k) (we can choose c{k) = it holds that 

c(fc — l)(p — 1)'=-! + c{k){p — 1)'= > c{k)p^. Also there exists c{k) such that 
8-1(2'=, p) > c{k)p^. □ 

Corollary 2. Let k be fixed. Then 0{f/n) pebbles are sufficient for a complete 
computation on Ch{n) with time 0(2'=n). 

Another upper bound of the time-space tradeoff for the reversible pebbling on 
chain topology can be obtained by using Bennett’s pebbling strategy introduced 
in [1]. 8ince this strategy pebbles chain of length /c" with n{k — 1) -I- 1 pebbles 
in time {2k — 1)", it yields time-space tradeoff in the form: space O(^^lgn) 

lg(2fc-l) 

versus time U{n ). 

4 Binary Tree Topology 

In this section we will discuss space complexity of reversible pebble game on 
complete binary trees. A complete binary tree of height 1 (denoted as Bt(l)) is 
a graph containing one vertex and no edges. A complete binary tree of height 
h > 1 (denoted as Bt(/i)) consists of a root vertex and two subtrees, that are 
complete binary trees of height h — 1. 

This topology represents a class of problems, where the result can be com- 
puted from two different subproblems. 

We denote the root vertex of Bt{h) as R{Bt{h)), the left subtree of Bt{h) as 
ljt{Bt{h)) and the right subtree of Bt{h) as Rt{Bt{h)). 

As mentioned in section 2, we denote the minimal number of pebbles needed 
to perform a complete computation on Bt(/i) as 8min(/i)- In the sequel we also 
consider the minimal number of pebbles needed to perform a computation from 
the empty configuration to a configuration, where only the root is pebbled. 

Definition 7. LetC be a computation of length I on Bt(ft-). LetC{l) = E(Bt(ft-)) 
and C{1) = Put(E(Bt(/i)),R(Bt(/i)), 1). Then C is called a semicomplete compu- 
tation. 

The minimal number of pebbles needed to perform a semicomplete compu- 
tation on Bt(/i) (e.g. min{8(C)}, where C is a semicomplete computation) will 
be denoted as S'^^^{h). 

We will use the following inequalities between 8min(/i) and 8G„(/i). Their 
proofs are not difficult and are left out due to space reasons. 
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Lemma 3. Smin(/i) + 1 > > Smin(M 

Lemma 4. + 1) = Smi„(/i) + 2 

4.1 Tight Space Bound for Binary Tree Topology 

From the previous lemmas follows, that equals to h plus the number 

of such i < h, that = Smin(i)- In the following considerations we use a 

function The value h = denotes the maximal height of binary tree 

that can be pebbled by a semicomplete computation, that uses at most h + p 
pebbles. Formally, S'~;„(p) = Taax.{h\3C € Sc/jAS(C) = h+p}, where Sch is the 
set of all semicomplete computations on Bt{h). From the definition of S'~;„(p) 
follows, that = h+ ^ 

Now we prove the upper (lower) bound of S'jjjjjj(p). From that follows lower 
(upper) bound of S(jj;„(h) and therefore also lower (upper) bound of Sniin(^) 
respectively. 

Lemma 5. Let h' = S'^i„(p), h = S'^[jj(p+ 1). Then the following inequality 
holds: 

h-h' -I < 2'^'+P+i - 1 

Proof. A configuration on a binary tree is called opened, if there exists a path 
from the root to some leaf of the tree, such that no pebble is laid on this path. 
Otherwise, the configuration is called closed. 

From the assumption h = S'~;n(p+ 1) follows, that there exists some semi- 
complete computation C of length I on Bt(/i), such that S(C) = h + p+ 1. Let i 
be the first configuration of C, such that C(?)(R(Bt(/i))) = 1 for any j > i (e.g. 
* = min{*|(V, > ?) CiMRiBtih))) = 1} ). " ' ” ' " 

Because C is a reversible computation, C(i)(R(Lt(Bt(/i)))) = C(i)(R(Rt(Bt( 
h)))) = 1. Therefore Put(C(z), R(Bt(/i)), 0) is a closed configuration. Because 
Put(C(l),R(Bt(/i)),0) = E(Bt(/i)), this configuration is opened. Let j be the 
minimal number such that j >i and Put(C(j), R(Bt(/i)), 0) is opened. 

Because Put(C(j), R(Bt(/i)), 0) is opened and Put(C(j — 1), R(Bt(/i)), 0) is 
closed and C is a reversible computation, there exists exactly one path in C{j) 
from the root to a leaf, such that no pebble is laid on it. Without loss of general- 
ity we can assume, that this path is R(Bt(/i)), R(Rt(Bt(/i))), R(Rt^(Bt(/i))), . . . , 
R(Rt'*"^(Bt(/i))). 

Now we prove, that for each k, h > k > h' + 2, and for each p, i < p < j, it 
holds that #(C(p)(Lt(Rt'*"'=(Bt(/i))))) > 0. 

Assume, that this conjecture does not hold. Let k be the maximal num- 
ber such that violates this conjecture. Let p be the maximal number such that 
i<p<j and #(C(p)(Lt(Rt'*"'=(Bt(/z))))) = 0 V #(C(p)(Rt(Rt'*"'=(Bt(/z))))) = 
0. Without loss of generalit y, let #(C(p)(Lt(Rt^“*(Bt(/i))))) = 0. Because 
Put(C(p), R(Bt(/i)), 0) is closed, in a configuration C{p) is pebbled at least one 
vertex from R(Rt(Bt(/i))), R(Rt^(Bt(/i))), ..., R(Rt^“^(Bt(/i))). In a configu- 
ration C{j) are all these vertices unpebbled. Let q be the minimal number such 
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that q > p and all these vertices are unpebbled in C{q). Because C is a re- 
versible computation, C(g— l)(R(Rt^“^(Bt(/i)))) = 1, C(g)(R(Rt^“^(Bt(/i)))) = 
0 and C(g)(R(Lt(Rt^“^(Bt(/i))))) = 1. Now consider the computation C' = 
Rst(C,p, g, Lt(Rt^“^(Bt(/i)))). Computation C + Rev(C') is a complete compu- 
tation on Lt(Rt^“^(Bt(/i)))) (this graph is isomorphic to Bt(/c — 1)). Space of 
this computation is at most S(C' + Rev(C')) < S(C) — {i + h — k) = k+ p — 2. 
From our assumption follows, that space for any semicomplete computation on 
Bt(fc — 1) is at least k+p. From Lemma 3 follows, that the space for any complete 
computation on Bt(fc — 1) is at least k+ p — 1, what is a contradiction. 

Now consider C 2 = Rst(C, t, j, R(Rt(Bt(/i))) U R(Rt^(Bt(h))) U ... U R( 
Rt/i-ft. It is a computation on a graph isomorphic to Ch{h—h' — 1). 

In the first configuration of C 2 , vertex R(Rt(Bt(/i))) is pebbled. In the last con- 
figuration of C 2 , no vertex is pebbled. Therefore Rev(C 2 ) -I- C 2 is a complete 
computation on a graph isomorphic to Ch{h — h' — 1). 

Because for each k, h > k > h' + 2, and for each p, i < p < j, it holds that 
#(C(p)(Lt(Rt^“*(Bt(h))))) > 0 and C(p)(R(Bt(/i))) = 1, we can estimate upper 
bound for space of C 2 : S(C 2 ) < {h+p+l) — {l + h — h' — l) = h'+p+l. Using space 
upper bound for chain topology (Theorem 1) we have h — h' — 1 <2^ +p+i _ 

□ 



Lemma 6. Let h' = S'^i„(p), h = S'^?jj(p-|- 1). Then the following inequality 
holds: 

h-h' -l> 2'*'+p-2 - 1 

Proof. We prove by induction, that for each k £ {h' + 1 ... h' + 2^ +p-2| ^j^ere 
exists a semicomplete computation C on Bt{k) such that S(C) < k + p+1. This 
implies, that h>h' + 2^ +p- 2 ^ 

The base case is k = h' + 1. By assumption there exists a semicomplete 
computation C on Bt(/i') such that S(C) = h' +p. After applying C to Lt(Bt(fc)) 
and Rt(Bt(/c)), pebbling R(Bt(/c)) and applying reversed C to Lt(Bt(fc)) and 
Rt(Bt(/c)) we obtain a semicomplete computation on Bt(/c) that uses at most 
h'+p + 2 = k+ p+l pebbles. 

Now assume that the induction hypothesis holds for each i £ {h' + 1 . . . k—1}. 
We construct a computation C on Bt(fc) as follows: At first we apply semi- 
complete computations on Lt(Bt(/c)), Lt(Rt(Bt(fc))), . . . , Lt(Rt^“^ “^(Bt(fc))), 
Lt(Rt^“^ “^(Bt(fc))), Rt^“^ (Bt(fc)) sequentially. By induction hypothesis, the 
space of a semicomplete computation on Lt(Rt*(Bt(fc))) is less than or equal to 
{k — i — l)+p+l for i < k — h' — 2. By assumption, the space of a semicomplete 
computation on Lt(Rt*“^ “^(Bt(fc))) and Rt(Rt^“^ “^(Bt(fc))) is less than or 
equal to h' + p. Therefore space of this part of C is less than or equal to k + p. 

In the second part of C, we perform a space optimal complete computation on 
a chain consisting of vertices R(Bt(fc)), R(Rt(Bt(/c))), . . . , R(Rt^“^ “^(Bt(fc))). 
Due to the Theorem 1, space of this part is less than or equal to |"log 2 (fc— -I- 
k — h' + l. Because k < h' + 2 ^'~^p~‘^, it holds |"log 2 (fc — h'-b 1)] +k — h' + l < k+p. 

The third part of the computation C is the reversed first part. 
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Also, C is a complete computation on Bt(/c) and S(C) < k + p. Hence, 
Smin(^) < k + p. Using Lemma 3, S'^^^{k) < k + p + 1. Therefore there ex- 
ists a semicomplete computation on Bt(/c) with space less than k +p + 1. □ 

Lemma 7. For p >2 it holds that 2®'mm(p) < -|- 1) < 2‘*®'min(p). 

Proof. Let h' = S'~J„(p), h = -I- 1). From Lemma 5 follows h — h' < 

2 h -i-p+i^ what is equivalent to S'~;„(p-|- 1) < 2® mm(p)+P+i -|- 

From the definition of and from Lemma 3 and Lemma 4 trivially follows, 
that S'~i„(p) > 2p. Therefore 2® <2^® min(p) forp > 1. Also, 

the second inequality holds. 

From Lemma 6 follows h—h' >2^ +p-2^ what is equivalent to 1) > 

2® min(p)+P-2 _|_ S'“Jjj(p). Therefore for p > 2 it holds S'~[„(p-|- 1) > 2® mi„(p). □ 

Theorem 4. Smin(^) = h + 6>(lg*(h)) 

p 

Proof. From the previous lemma follows, that S'~;,,(p) = 0(16^® *') and that 

p 

S'mL(p) = ^(2^ ")• Because SU„(h) = h+{S'l^,J-^{h), it holds that SU„(h) = 
h + n{lg*{h)) and SU„(/i) = h+0{lg*{h)). Therefore SU„(h) = h + 0{lg*{h)). 
From Lemma 3 follows, that Smin(/i) = h + 0{lg*{h)). □ 

4.2 Extension to Butterflies 

Butterfly graphs create important class of graphs to study, as they share su- 
perconcentrator property and the butterflies form inherent structure of some 
important problems in numerical computations, as discrete FFT. 

A butterfly graph of order d is a graph G = (U, E), where U = {1 . . . d} x 
{0 . . . 2^~^ — 1} and E = {((i, j), (t + 1, j xor 2*“^))|1 < i < d,0 < j < 2'^~^ — !}• 
This graph can be decomposed into 2‘^~^ complete binary trees of height d. The 
root of i-th tree is vertex (l,i) and this tree contains all vertices, that can be 
reached from the root. 

The decomposition property implies, that the minimal space complexity of 
a complete computation on butterfly graph of order d cannot be lower than the 
minimal space complexity on a complete binary tree of height d (otherwise we 
can restrict a complete computation on butterfly to any binary tree to obtain a 
contradiction) . 

On the other side, by sequentially applying complete computations to all 
binary trees obtained by decomposition of the butterfly graph, we obtain a com- 
plete computation on it. Also, we can construct a complete computation on a 
butterfly graph of order d with space complexity equal to minimal space com- 
plexity of the binary tree of height d. Therefore the minimal space complexity of 
the butterfly topology equals to the minimal space complexity of the binary tree 
topology (e.g. the minimal space for a butterfly graph of order d is d-|-0(lg*(d))). 
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5 Conclusion 

In this paper we have analysed an abstract model for reversible computations - a 
reversible pebble game. We have described a technique for proving time and space 
complexity bounds for this game and presented a tight optimal space bound for 
a chain topology, upper and partial lower bounds on time of optimal space for a 
chain topology, an upper bound on time-space tradeoff for a chain topology and 
a tight optimal space bound for a binary tree topology. These results implies, 
that reversible computations require more resources than standard irreversible 
computations. (For a space complexity of a chain topology it is 6>(1) vs. 6>(lgn) 
and for a space complexity of a binary tree topology it is -b 0(log*(/i)) vs. 

h + e{l).) 

For further research, it would be interesting to examine the time complexity 
of the reversible pebble game for tree and butterfly topology and to consider 
other important topologies, for example pyramids. 
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Abstract. Simulation of global illumination in 3D scenes is a compu- 
tationally expensive task. One of the goals of the project HiQoS (High 
Performance Multimedia Services with Quality of Service Guarantees) 
was to develop and test a prototype of an e-commerce system which 
simnlates realistic lighting of large scenes on high performance parallel 
compnters. The system, althongh tailored to the needs of this specific 
application, is very generic and exhibits metacomputing featnres: l.the 
access to high performance compnters is fully transparent to the user; 
2. the modular architecture of the system allows to dynamically add or 
remove computing resources in geographically different computing cen- 
ters. The prototype of the proposed system was evaluated in the in- 
dustrial contexts of architectural visualization and film production. This 
paper summarizes scientific and technical problems which arose during 
the project as well as their solutions and engineering decisions. 



1 Introduction 

Photorealistic visualization of 3D models is important in many industrial areas, 
for instance in architecture, entertainment industry and film production. The 
requirements to the level of realism as well as the complexity of the models 
grow very fast and so do the requirements to the performance of the systems 
used for the visualization. The computing power needed for the synthesis of 
photo-realistic images {rendering in this paper) in a reasonable time falls into 
the category of high performance computing. 

Companies needing high-quality visualizations seldom have supercomputers 
on their own. Such machines are either too expensive or regarded as irrelevant by 
many IT managers. A rental and a temporary physical installation of additional 
computers in order to meet production deadlines are accompanied by technical 
problems and additional costs. 

A prototype of an advanced e-commerce rendering system addressing the 
above problems has been developed during the project HiQoS (High Perfor- 
mance Multimedia Services with Quality of Service Guarantees) [1], [2]. This 
system, HiQoS Rendering System, allows a user to submit rendering jobs via a 
simple web interface. The further processing of jobs is fully automatic. The com- 
plexity and the distributed nature of the system are hidden from the user. The 
project HiQoS was financially supported by the German Ministry of Education 
and Research (BMBF). The HiQoS Rendering project partners were: Axcent 
Media AG, GPO mbH, lEZ AG, University of Paderborn and Upstart! GmbH. 



L. Pacholski and P. Ruzicka (Eds.): SOFSEM 2001, LNCS 2234, pp. 304—315, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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We know about several projects related to a remote global lighting simulation. 
Virtual Frog [3] is one of the pioneering works which aims to support teaching of 
biological principles: . . our goal is to provide accessibility over the Web in order 

to reduce the complexity of installing, running, and managing the software.” The 
interactivity played a major role in this project - the model (of a frog) resides 
on a server and the server computes a desired visualization of the model (a 
schematic view, a view of a scanned slice, a volume traced view, etc.) Online 
Rendering of Mars [4] is technically very similar: the user chooses a perspective 
and a lighting and within a few minutes gets back a rendered picture of Mars. 
Closer to the HiQoS project idea is a rendering server using the (sequential) 
Radiance program to compute ray traced pictures of a user’s model [5] . There also 
are companies offering their computers for rendering purposes [6] - however, the 
data exchange, job specification and scheduling of the rendering jobs are handled 
by human operators. The HiQoS rendering project went further, offering an 
automatic and parallel rendering service on demand, whereby parallel computers 
in several computing centers can be combined into a single computing system. 

The global illumination problem is defined in section 2. Parallelizations of 
two global illumination methods, ray tracing and radiosity, is discussed in the 
same section. Special attention is devoted to an efficient representation of the 
diffuse global illumination resulting from the radiosity method. Two approaches 
to the simplification of radiosity solutions are presented: mesh decimation and 
radiosity maps. The architecture of the HiQoS Rendering System is described 
in section 3. Section 4 describes an evaluation of the HiQoS Rendering System 
in two industrial scenarios, in architectural visualization and in film production. 
In section 5 we draw our conclusions and sketch our future research directions. 



2 Simulation of Global Illumination 

Photorealistic visualization of 3D models is one of the quests of computer graph- 
ics. Almost all light phenomena are well explained by the quantum electrody- 
namics. However, for practical purposes it is not desirable to simulate the propa- 
gation of light or to model large-scale 3D models on a subatomic level. Synthesis 
of photorealistic pictures is a chain of simplifications. Kajiya’s rendering equa- 
tion [7] provides a framework for the simulation of global lighting on the level of 
geometrical optics. The rendering equation is an integral equation describing an 
energy balance in all surface points of a 3D model. With exception of extremely 
simple cases this equation cannot be solved analytically. The ray tracing and 
radiosity methods [8] make additional assumptions about the light-object inter- 
actions in order to simplify the rendering equation. 

The assumption of ray tracing is that all indirect light reflections are perfectly 
specular. Ray tracing does not solve the rendering equation explicitly - it traces 
photons from the camera into the 3D scene (in a backward direction), measuring 
the contributions of light sources to the camera pixels along the traced paths. 
Ray tracing is inherently view dependent. Its product is the picture seen by a 
given camera. The illumination is not stored in the 3D model. 
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Radiosity assumes that all light reflections are perfectly diffuse and that 
the 3D model consists of a finite number of small elements (patches). Under 
these assumptions the rendering equation can be formulated as a system of 
linear equations. The system is usually very large and the computation of all 
its coefficients is prohibitively expensive. Radiosity algorithms solve the system 
iteratively, storing the illumination in the 3D model. The radiosity method is 
view independent. A converged radiosity solution is an illuminated 3D model 
which can be viewed from different perspectives without having to rerun the 
lighting simulation. 

Data-parallel radiosity and ray tracing algorithms have been developed and 
implemented within the HiQoS project. The following sections briefly describe 
the parallelization ideas. (An asynchronous distributed memory model with mes- 
sage passing is assumed.) 



2.1 Parallel Ray Tracing 

Our ray tracing parallelization is based on a screen subdivision strategy which 
exploits the independence of computations on any two pixels. Processor farming 
is a natural way of the parallelization. However, there are two potential problems 
with a straightforward implementation: bad load balancing (computation times 
for two pixels are different) and no data-scalability (replication of the 3D model 
in each processor imposes a limit to the maximum model size). 

The problem of unequal processor load can be solved under an assumption 
that the ratio of the computation times on any two equally large areas of the 
screen can be bounded by a constant [9]. The idea is to assign large screen areas 
to processors first, then smaller, ending up with single pixels (or other sufficiently 
small pieces). Almost linear speedups can be measured up to 64 processors. 

A distributed object database is used to avoid the necessity of storing the 
whole 3D model in every processor. Large models (requiring several Gigabytes 
of memory) can so be rendered on currently available parallel machines. The 
only limitation to the model size is the total memory of the processors used for 
the computation. Each processor holds a resident subset of all objects of the 3D 
model. The remaining objects (or a part of them) are stored in a cache memory. 
If a non-resident object is needed during the computation, the processor stops 
the computation, sends a request to the processor holding the object and makes 
space for the requested object in its cache by removing some other objects from 
the cache if needed. Upon having received the requested object, the processor 
resumes the computation. An LRU strategy is used for the cache management 
(always the Last Recently Used object is removed from the cache). A similar 
implementation is described in [10]. 



2.2 Parallel Radiosity 

A progressive refinement method [1 1] is used for the simulation of light propa- 
gation. Each patch of the model is a potential light source. The unshot energy 
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and the total reflected energy are stored by each patch. The original (sequen- 
tial) progressive reflnement method iteratively selects a patch with the most 
unshot energy (a shooter) and shoots the whole unshot energy in the half-sphere 
surrounding the shooting patch. The total reflected and unshot energies of all 
other patches {receivers) are updated during the shooting according to their 
visibility from the shooter (the receiving patches can be refined in this step to 
store the gathered energy accurately enough). This process is iterated until the 
total unshot energy drops down under a threshold (or some additional termi- 
nation criterium is fulfilled). The computation of visibility between two patches 
(a so-called form factor) is a non-trivial problem. We are currently using a ray 
casting method for the form factor computation [12]. This method randomly 
generates samples on the shooter and the receiver and checks how many pairs of 
the shooter-receiver samples are mutually visible. The number of visible sample 
pairs is used in estimation of the total mutual visibility between the two patches. 

The parallel radiosity algorithm begins with a preprocessing step (a meshing 
step) in which the model is discretized into patches (in our implementation the 
mesh consists only of triangle and quadrangle patches). The discretized model is 
partitioned into 3D subscenes which are distributed onto processors. The shoot- 
ing iterations run asynchronously in all processors. Aside of the locally stored 
patches, each processor maintains an incoming message queue which contains 
information about shooters selected by other processors. At the beginning of 
each iteration a processor looks for the shooter with the most unshot energy 
among the patches in its message queue and the locally stored patches. Then 
it performs either an external shooting (if a shooter from the message queue 
has been selected) or an internal shooting (if a local shooter has been selected). 
A shooting influences only the locally stored patches. To spread the informa- 
tion about shooting iterations performed locally, the processor broadcasts the 
selected shooter in the case of an internal shooting [13], [14]. 

Additional improvements to the parallelization described above include dy- 
namic load balancing (a processor’s load can be measured by the number of yet 
unshot external shooters waiting in the message queue) and two representations 
of the 3D model used to speed up form factor computations [15]. 



2.3 Simplification of Radiosity Solutions 

The diffuse illumination computed by the parallel radiosity program is stored 
in the vertices of the polygon mesh (vertex radiosities) . Interpolation is used to 
compute the outgoing radiosities in other surface points. The original mesh gets 
progressively refined during the radiosity computation (new vertices are created) 
in order to store the illumination accurately enough. This refinement leads to 
huge data volumes resulting from radiosity simulations (Fig. 1). Memory require- 
ments of radiosity solutions cause problems by transferring the data to the user 
over the Internet, by a subsequent postprocessing of the scene, by interactive 
walkthroughs, by rendering final pictures, etc. The following sections describe 
two methods of simplification of radiosity solutions. Both methods were imple- 
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Fig. 1. Refinement of a polygonal mesh: (a) original mesh; (b) refined mesh 



merited (both sequential and data-parallel versions) and the later was integrated 
in the HiQoS Rendering System. 



Mesh Decimation. The idea of the mesh decimation method is an iterative 
deletion of edges from the model using the vertex-unify operation (joining of 
two adjacent vertices into one vertex). The problem of overwhelming memory 
complexity also arises by acquisition of 3D models using 3D scanners and the 
first existing mesh decimation methods have been developed in this application 
area [16], [17]. The presence of additional radiosity information stored in the 
vertices of an illuminated model influences only the choice of a metric used in 
the algorithm. 

Mesh decimation algorithm 

INPUT : a polygon mesh M, a desired compression ratio c 
WHILE (compression ratio c is not reached) 

(1) In M, select edge e = [Pi, P 2 ] for removal 

(2) Remove edge e by joining points Pi, P2 into point P 

(3) Find optimal placement for point P 
OUTPUT: a simplified polygon mesh M 

It is not obvious in which order the edges should be scheduled for the removal 
(line 1) and how to find the optimal placement for the vertices resulting from 
the vertex-unify operation (line 3). An objective metric is used to answer both 
questions in a single optimization step. The metric evaluates the current quality 
of the simplified mesh. The selection of the edge to be removed and the placement 
of the resulting vertex are a solution to an optimization problem which minimizes 
the quality reduction over all possible edge selections and all possible placements 
of the resulting vertex. The choice of the metric determines the complexity of 
the optimization problem (costs of evaluation of the metric alone must also be 
considered) and influences the final visual quality of the simplified mesh. We 
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used a quadric metric [18], [19] which takes the vertex radiosities into account 

[ 20 ]. 

One problem of the mesh decimation method is that the visual quality of the 
simplified mesh is not guaranteed. Mesh decimation can be seen as a lossy com- 
pression of a model. Whereas compression ratios of up to 90% can be achieved 
retaining an acceptable visual quality for some scenes, the visual error by the 
same compression ratios is high for other scenes. The compression ratio can be 
chosen interactively (using the operator’s visual perception as a quality measure) 
but interaction makes the method unsuitable for use in an automated remote 
rendering system. 



Radiosity Maps. The method of radiosity maps provides a non-lossy compres- 
sion of illuminated meshes using a more efficient representation of the illumina- 
tion. The vertex radiosities are stored in radiosity maps (texture maps) instead 
of in mesh vertices. One radiosity map is assigned to each object (an object 
is a set of patches forming a logical element, e.g. chair, table, staircase, etc.) 
Additionally, MU-mapping coordinates are stored in the vertices of the resulting 
mesh. The entire substructuring information is removed from the mesh. Typical 
compression ratios are 70%-80%. Moreover, texture mapping (see Fig. 3 (a)) is 
supported by the hardware of recent graphics cards which significantly increases 
framerates by interactive walkthroughs. 

Computation of radiosity maps 

INPUT: a mesh M with vertex radiosities 
FOR (each object obj^ in mesh M) 

FOR (each patch pi in object objk) 

(1) Find optimal resolution of radiosity map for patch pi 

(2) Create radiosity map pmapi for patch pi 

(3) Fill radiosity map pmapi for patch pi (Fig. 2) 

(4) Pack patch maps pmapi into object map omapk (Fig. 3 (b)) 

(5) Remove substructuring from all patches pi in object objk 

(6) Assign Mu-mapping coordinates to all patches pi in object objk 
OUTPUT: a simplified mesh M with radiosity maps assigned to objects 

(substructuring information removed) 

Optimal resolution of a patch map (line 1) is the minimal resolution allowing 
a non-lossy representation of the original illumination information. This resolu- 
tion depends of the depth of the substructuring of a given patch (examples are 
shown in Fig. 2). 

An optimal packing of patch maps into an object map (line 4) is an NP-hard 
problem. A heuristic is used in this step. The heuristic tries to keep the object 
map as small as possible and to use the rectangular area of the object map 
efficiently (Fig. 3 (b)). 
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Fig. 2. Optimal resolutions of radiosity maps: (a) a quadrangle patch and the corre- 
sponding map; (b) a triangle patch and the corresponding map. 

Original vertex radiosities are stored in the marked texels. The colours of empty texels 
are interpolated from the marked ones 




Fig. 3. (a) Mapping of a radiosity map onto a 3D object (wu-texture mapping). The uv- 
mapping coordinates are stored in the object’s vertices; (b) Packing of patch radiosity 
maps into an object radiosity map. The dashed area corresponds to the total area of 
the patch maps (a lower bound for the optimal object map size) 



3 Architecture of the HiQoS Rendering System 

The HiQoS Rendering System is a distributed system consisting of four subsys- 
tems: Client, Service Broker, Scheduling Subsystem and Rendering Server. All of 
these subsystems may be replicated. An exception is the Service Broker, the sys- 
tem’s central entry point. The subsystems and their components communicate 
via TCP/IP sockets. No shared file system is needed. A possible configuration 
of the system is shown in Fig. 4. The main objectives of this architecture are: 

— minimal hardware and software requirements on the user’s side (only an 
Internet connection and a web browser are needed) 

— transparent access to high performance computing systems (the user does 
not know where the computation takes place or what computing systems are 
used for computing his job) 

— good resource utilization (especially by rendering walkthroughs consisting of 
many frames). 
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Fig. 4. An example configuration of the HiQoS Rendering System 



3.1 Client 

The Client subsystem is responsible for a user’s authorization by the Service 
Broker, submission of a rendering job and providing input data needed by the job. 
The input data generated automatically by the user’s modelling software describe 
a 3D scene: its surface geometry, surface materials, light sources and cameras. 
Extended VRML 2 and 3DS formats are currently supported. It is practical 
to separate the camera information from the rest of the scene description (a 
proprietary camera format is used which allows a definition of single cameras as 
well as camera paths). The rest of the data are controlling parameters specific 
to the selected illumination method. These are provided by a human operator 
who fills out an HTML form when submitting a rendering job. 

The 3D scene data can be large. They bypass the service broker and flow 
from the Client directly to the Clobal Scheduler. Two transmission scenarios 
were considered: 

— A passive client scenario, in which the user places the exported scene data in 
a public area in the Internet (e.g. the user’s web area) and tells the rendering 
system where they are (URLs). This is the currently implemented scenario. 
Main disadvantages of this scenario are an additional load in the system 
related to the downloading the user’s data and a violation of the privacy of 
the data. An advantage is a simple implementation. 

— An active client scenario, in which the user (e.g. a software component inte- 
grated in the user’s 3D modeller) uploads the data to the rendering system. 
Advantages are a possibility of integration of accessing the remote render- 
ing system directly from the user’s modeller, a possibility of maintaining an 
object cache on the rendering system’s side for reducing the transmission 
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costs, ensuring the privacy of the transmitted scenes, etc. A disadvantage is 
that additional software components and communication protocols between 
them must be implemented. 



3.2 Service Broker 

The Service Broker is the system’s entry point. This software component com- 
municates with users (and administrators of the system), accepts new jobs, gen- 
erates a unique id for each accepted job and delivers computed results to users. 
The Service Broker is implemented as an Apache web server (PHP scripts are 
used for implementing the communication with the Global Scheduler and for 
accessing databases kept on the server). 



3.3 Scheduling Subsystem 

The Scheduling subsystem consists of a Global Scheduler send several Local Sched- 
ulers). The former downloads the job data, converts the data, distributes jobs 
(or their parts) to the network of Local Schedulers, collects the results (or partial 
results) from Local Schedulers and passes the results and a status information 
to the Service Broker. There is usually only one Global Scheduler running in the 
entire system. 

Different parallel computing systems in different computing centers can be 
used in the system. There is usually one instance of the Local Scheduler running 
in one computing center. The main task of the Local Scheduler is to hide the dif- 
ferences between the parallel systems, providing a unique interface for allocation 
and deallocation of processors, for starting a parallel application on allocated 
processors, etc. 

Such two-level scheduling scheme is modular and also helps to efficiently uti- 
lize the computing resources by rendering of ray traced walkthrough animations. 
In this case the Global Scheduler acts as a farmer distributing single frames to 
several Local Schedulers. After having computed a frame, the Local Scheduler 
sends the frame (a picture) to the Global Scheduler and gets in turn a new frame 
to compute. Between the computations of two subsequent frames the 3D scene 
persists in memories of running parallel processors on the Local Scheduler's side. 
Thus only a short description of the new camera must be retransmitted between 
the schedulers with each single frame. 



3.4 Rendering Server 

Data-parallel ray tracing and data-parallel radiosity algorithms are currently 
supported by the HiQoS Rendering System. These programs are precompiled 
for the target parallel systems in computing centers. An actual implementation 
of parallel global illumination algorithms is hidden in the Rendering Server. 
The Rendering server provides an interface to a Local Scheduler allowing start- 
ing, continuation and termination of a chosen parallel program on an allocated 
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partition of a parallel system. The interface is independent of the illumination 
method, even though the methods considerably differ from each other (also in 
input and output parameters). A radiosity job is handled by a Local Scheduler 
in the same way as a ray tracing job with one camera. 

4 Evaluation of the HiQoS Rendering System 

The remote rendering system has been evaluated in two industrial scenarios: 
architectural visualization and film production. Although the final goal of both 
scenarios is a synthesis of photorealistic pictures, the way of achieving this goal 
differs. The user of the first scenario was the company GPO mbH which creates 
complex CAD models using the software Speedikon (by lEZ AG) and Arcon: 
Architektur Visualisierung (by mb-Software AG). The requirement to the ren- 
dering system was a synthesis of high quality visualization pictures and camera 
animations of the models. The results have been used for supporting the build- 
ing contractors during the planning phase and for a public presentation of the 
models. The parallel ray tracing offered by the HiQoS Rendering System al- 
lows to significantly reduce the rendering times in comparison to a sequential 
computation on a PC. The measured total overhead of the system (the effec- 
tive rendering time against the time spent in downloading the models and in 
scheduling) is below 10% already by small rendering jobs consisting of about 10 
frames (and smaller by more complex jobs). 

The user of the film production scenario has been the company Upstart! 
GmbH which creates special visual effects for movies (e.g. “Operation Noah”) 
and advertisements. 3D Studio Max (by Discreet) is used for the 3D modeling. 
A frequent problem is a realistic illumination of building interiors which do not 
really exist. A sequence of pictures is not desirable as a product of the global 
illumination simulation because this would drastically reduce the flexibility by 
the final composition. Rather than pictures, an explicit 3D representation of the 
illuminated model is required as an intermediate result of the simulation. The 
(sequential) radiosity of Lightscape (by Discreet) is used in this step of the origi- 
nal production chain. The sequential radiosity computation of a complex model 
can take many hours, sometimes days. Another, even more serious problem are 
the run-time memory requirements of the radiosity method which sometimes 
exceed the possibilities of a PC. The data-parallel radiosity integrated in the 
HiQoS Rendering System leads to shorter computation times and overcomes the 
memory problems by using more processors with more total memory. The radios- 
ity maps described in section 2.3 are used for the compression of the resulting 
data. 

5 Conclusions 

We described a parallel rendering system which allows computation of the global 
illumination of complex 3D models sent by users via the Internet. Currently 
data-parallel ray tracing and radiosity illumination methods are supported by 
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(a) (b) 



Fig. 5. (a) Model of the Dom in Wetzlar. Global diffuse illumination was computed 
by radiosity, textures and camera effects were added in a subsequent rendering step; 
(b) Ray traced model of a furnished house 



the system. The system was integrated as a prototype of an e-commerce service 
and successfully evaluated in two industrial scenarios. Possible extensions and 
research areas include implementations of further global illumination methods, a 
seamless integration of the remote rendering service with existing 3D modellers, 
a support for dynamical scenes and a better resource management. 
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Abstract. We introduce restarting automata as two-way automata in 
order to obtain a more general model which is closer to our linguistic 
motivations. We study the notion of j'-monotonicity to show the advan- 
tages of this model. We show that the ji-monotonicity can be considered 
as a degree of non-context-freeness and that it is a robust notion due 
to the considered models of restarting automata. Some other aspects 
concerning power and applications of two-way restarting automata are 
mentioned. 



1 Introduction 

Roughly speaking, a restarting automaton accepts a word by a sequence of 
changes (cycles) where each change results in a shorter word for which the au- 
tomaton restarts. 

One important (linguistic) motivation for studying restarting automata is to 
model analysis by reduction of natural language sentences in a similar way as 
in [9]. The analysis by reduction consists of a gradual simplification of a full 
sentence so that the (in)correctness of the sentence is not affected. Thus, after 
a certain number of steps, a simple sentence is achieved or an error is found. 
Two-way restarting automata are more adequate for this purpose than one- 
way restarting automata because of an easier (deterministic) implementation 
of desired ’global outlook’ on the whole reduced sentence. We feel that two- 
way restarting automata are closer to the human procedure performing analysis 
by reduction. Two-way restarting automata can be considered not only as a 
generalization of one-way restarting automata but also as a generalization of 
contraction automata (see [10]). 

This paragraph serves for further explanation of our linguistic motivations. 
In order to give a possibility to characterize a type of complexity of different 
(natural) languages and their analysis by reduction we introduce and study in a 
formal way the notion of degree of (non) monotonicity. In this way we obtain a 
new formal characterization of the still vague notions of word-order complexity of 
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(natural) languages. In [2] the word-order complexity in languages was measured 
using a measure of nonprojectivity of dependency (or syntactic) trees. We can 
use the degree of (non)monotonicity in a similar way as was used the measure of 
nonprojectivity in [3] . We have shown there that the degree of word-order com- 
plexity of English is significantly lower than the degree of word-order complexity 
of Czech or Latin. I.e., using our results, it should be also possible to show that 
the ’syntax’ of Czech is ’more distant’ from context-free syntax than the ’syntax’ 
of English. Let us note that the degree of (non)monotonicity uses much more 
elementary (easier observable) syntactic features of natural languages than the 
measure of nonprojectivity of D-trees. The formal results presented in this paper 
should demonstrate the idea, that the j-monotonic two-way restarting automata 
create a formal tool useful for analyzers, which perform the analysis by reduc- 
tion for natural languages, particularly for natural languages with a high degree 
of word-order freeness. Let us recall that the analysis by reduction (formally or 
informally, implicitly or explicitly) creates a starting point for development of 
syntactic parsers (or for formal descriptions of syntax and its (un) ambiguity) of 
natural languages. 

Now we focus our attention on the technical content of the paper. The 
(non) monotonicity of degree 1 of a computation C means, roughly speaking, 
that the sequence of distances between the individual changes in the recog- 
nized word (sentence) and the right end of the word is decreasing by C. The 
(non)monotonicity of degree j (j-monotonicity) of a computation means that 
the mentioned sequence of distances can be obtained by a ’shuffle’ of j (or less) 
decreasing sequences. 

We have started to study the j-monotonicity a short time ago, see [7]. This 
paper should stress the power of (deterministic) two-way automata, and the 
robustness of the basic observation that the j-monotonicity determines several 
infinite scales of languages and their analyzes by reductions. 

2 Definitions 

Throughout the paper, A denotes the empty word. We start with a definition of 
restarting automata, in special forms in which we shall be interested. 

A two-way restarting automaton M, formally presented as a tuple M = 
{Q, S, r,k, I, qo,QA,QR), has a control unit with a finite set Q of (control) 
states, a distinguished initial state go G Q, and two disjoint sets Qa,Qr Q Q 
of halting states, called accepting and rejecting respectively. M further has one 
head moving on a finite linear list of items (cells) . The first item always contains 
a special symbol <|;, i.e. the left sentinel, the last item always contains another 
special symbol $, i.e. the right sentinel, and each other item contains a symbol 
either from an input alphabet A7 or a working alphabet F ; we stipulate that S 
and r are finite disjoint sets and (j, $ ^ A U T. We assume the head having 
a look-ahead window of the size k, so that it always scans k consecutive items 
{k > 1). There is one exception, in the case when the right sentinel $ appears in 
the window M can also scan less than k symbols ($ is the last scanned symbol). 
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A configuration of M is a string aqf3 where q € Q, and either a = A and 
/3 G {(|:} • (A U r)* ■ {$} or a G {(|:} • (A U F)* and /3 G (A7 U F)* ■ {$}; here q 
represents the current (control) state, a(3 the current contents of the list of items 
while it is understood that the head scans the first k symbols of f3 (or the whole 
/3 when \(3\ < k). A restarting configuration, for a word w G (A7 U F)*, is of the 
form ( 7 o(|:w$; if w G E* , is an initial configuration. 

As usual, a computation of M (for an input word w G E*) is a sequence 
of configurations starting with an initial configuration where two consecutive 
configurations are in the relation \~m (denoted h when M is clear from the 
context) induced by a finite set / of instructions. Each instruction is of one of 
the following four types: 

(1) {q, 7 ) ^ ((?', MVR) (2) {q, 7 ) ^ (< 7 ', MVL) 

{5) (q,-/) ^ {q',REWRITE{-f')) (4) {q,-f) ^ RESTART 

Here g is a nonhalting state {q G Q — (Qa U Qr)), q' G Q, and 7 , 7 ' are 
“look-ahead window contents” (i.e., k > I 7 I), where I 7 I > | 7 '| (i.e., rewriting 
must shorten the word). 

Type (1) (moving right), where 7 = a/3, a being a symbol, induces aqaj36 h 
aaq'pS. Type (2) (moving left), where a = a\b, 7 = a/3, a,b being symbols, 
induces aqa(3S h aiq'ba/35. Type (3) (rewriting) induces aq^S h a^'q'S; but if 
7 = /3$, in which case 7 ' = /3'$, we have aqP$ h a/3'g'$. Type (4) (restarting) 
induces aq^S h qoajS. 

We assume that there is an instruction with the left-hand side {q, 7 ) for 
each nonhalting state and each (possible) 7 (i.e., in any configuration a further 
computation step is possible iff the current state is nonhalting). In general, the 
automaton is nondeterministic, i.e., there can be two or more instructions with 
the same left-hand side {q, 7 ), and thus there can be more than one computation 
for an input word. If this is not the case, the automaton is deterministic. 

An input word w G E* is accepted by M if there is a computation which 
starts with the initial configuration qo^w$ and finishes with an accepting config- 
uration - where the current control state is accepting. L{M) denotes the language 
consisting of all words accepted by M; we say that M recognizes (accepts) the 
language L{M). 

We observe that any finite computation of a two-way restarting automaton M 
consists of certain phases. A phase called a cycle starts in a (re)starting configu- 
ration, the head moves along the input list until a restart operation is performed 
and thus a new restarting configuration is reached. If no further restart operation 
is performed, any finite computation necessarily finishes in a halting configura- 
tion - such a phase is called a tail. For technical convenience, we assume that M 
performs at least one rewrite operation during any cycle (this is controlled by 
the finite state control unit) - thus the new phase starts on a shortened word. 

We use the notation u =^m v meaning that there is a cycle of M beginning 
with the restarting configuration ( 7 o<|:uS and finishing with the restarting config- 
uration the relation is the reflexive and transitive closure of =^>m- 

We sometimes (implicitly) use the following obvious fact: 
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Fact 1 . Error preserving property: 

Let M be a restarting automaton, and u,v two words in its input alphabet. If 
u V and u ^ L{M), then v ^ L{M). 

Correctness preserving property: 

Let M be a deterministic restarting automaton, and u, v two words in its input 
alphabet. If u v and u G L{M), then v G L{M). 

Now we define the subclasses of restarting automata relevant for our consid- 
eration. 

— An RLWW- automaton is a two-way restarting automaton which performs 
exactly one REWRITE-vastmctioxi in each cycle (this is controlled by the 
finite state control unit). 

— An RLW-automaton is an i?LITIT-automaton whose working alphabet is 
empty. Note that each restarting configuration is initial in this case. 

— An RL-automaton is an iZLIT-automaton whose rewriting instructions can 
be viewed as deleting, i.e., in the instructions of type ( 3 ), 7' is obtained by 
deleting some (possibly not continuous subsequence of) symbols from 7. 

— An RRWW -automaton is an iZLITIT-automaton which does not perform 
the instruction of the type (2) (move to the left) at all. 

— An RRW -automaton is an i?i?IFIF-automaton whose working alphabet is 
empty. 

— An RR-automaton is an i?i?IF-automaton whose rewriting instructions can 
be viewed as deleting. 

We recall the notion of monotonicity for a computation of an i?LITIF-automa- 
ton and we introduce the notion of j-monotonicity. Any cycle C contains a unique 
configuration aq(3 in which a rewriting instruction is applied. We call \j3\ the 
right distance, r-distance, of C, and denote it Dr{C). We say that a sequence of 
cycles Sq = C\,C2, ■ ■ ■ , is monotonic iff Dr{C\) > Dj.{C2) > ■ ■ ■ > Dr(C'„); 
A computation is monotonic iff the respective sequence of cycles is monotonic. 
(The tail of the computation does not play any role here.) 

Let j be a natural number. We say that the sequence of cycles Sq = {Ci, C2, 

• • • , Cn) is j-monotonic iff (informally speaking) there is a partition of Sq into j 
subsequences, where each of them is monotonic. I.e., the sequence of cycles Sq 
is j-monotonic iff there is a partition of Sq into j subsequences 
Shi = (Cpi, Ql2> • • • 5 C'ilpjl 

Sh2 = 

where each Shi, 1 < i < j is monotonic. Obviously a sequence of cycles 
{C\,C2, • • • , Cn) is not j-monotonic iff there exist 1 < zi < Z2 < ... < ij+i < n 
such that Dr{Cif) < Dr{Ci^) < ■ ■ ■ < 

A computation is j-monotonic iff the respective sequence of cycles is j-mo- 
notonic. (The tail of the computation does not play any role here.) We can see 
that 1-monotonicity means monotonicity. 
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Let j be a natural number. An RRWW- automaton M is j-monotonic iff all 
its computations are j-monotonic. 

Notation. For brevity, Nat denotes the set of natural numbers, prefix det- de- 
notes the property of being deterministic, prefix j-mon denotes j-monotonicity. 
C{A), where A is some class of automata, denotes the class of languages rec- 
ognizable by the automata from A. E.g., C{det- j-mon- RRW) denotes the class 
of languages recognizable by deterministic j-monotonic i?i?VF-automata. CFL 
denotes the class of context-free languages, DCFL the class of deterministic 
context-free languages. The sign C means the proper subset relation. We will 
sometimes write regular expressions instead of the respective regular languages. 
Similarly we will write only mon instead of 1-mon. 

3 Results 

Proposition 2. Let ML he an RLWW-automaton. Then there is a (nondeter- 
ministic) RRWW-automaton MR such that L{ML) = L{MR), u ^ml v ijj 
u ^MR V, and the right distances in the corresponding cycles by ML and MR 
are equal. 

Proof. We outline the main idea only. We need to construct MR in such a way 
that it is able to simulate any cycle of ML. For this aim we can use the technique 
of simulation of two-way finite automata by one-way finite automata (the tech- 
nique of crossing functions). We need to ensure the simulation of the first part 
of the cycle (the part before the single rewriting) of ML and at the same time 
we need to ensure the simulation of the second part of the cycle (the part after 
the rewriting). It means, that MR needs to use two sets of crossing functions 
(instead of one) in order to check the correctness of the simulated cycle. □ 

Theorem 3. Lt holds for any j G Nat: 

C(RRWW) = C(RLWW), C(j-mon-RRWW) = C(j-mon-RLWW), 
C(j-mon-RRW) = L(j-mon-RLW), C(j-mon-RR) = C(j-mon-RL). 

Proof. The theorem follows from the previous proposition. □ 

Theorem 4. C(mon-RRWW) = C(mon-RLWW) = CFL 

Proof. The fact C(mon-RRWW) = CFL was shown e.g. in [1]. The theorem 
follows from the equality C(mon-RRWW) = C(mon-RLWW) shown in Theorem 
3. □ 

The following fact is in [1] formulated for i?i?ITIF-automata. It holds for 
i?LIFIF-automata because of the Proposition 2 and Theorem 3. 

Fact 5. For any RLWW-automaton M there is a constant p such that the fol- 
lowing holds. Let uvw =^>m uv'w, |u| = p, and U2 is a subword of the word u, 
i.e., we can write u = U1U2U3. Then we can find a nonempty subword Z2, such 
that we can write U 2 = Z 1 Z 2 Z 3 , and UiZi{z 2 Y z^u^vw =>m UiZi{z 2 Yz 3 U 3 v'w for 
alH > 0 (i.e., Z 2 is a “pumping subword” in the respective cycle). Similarly, such 
a pumping subword can be found in any subword of length p of the word w. 
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Theorem 6. C(det-j-mon-RRWW) = DCFL holds for any j G Nat. 

Proof. We only outline the idea of the proof here. Let us suppose that M is a 
det-j-mon-RRWW-&xAoT[idd,on. and k is the size of its look-ahead window. We can 
see that M computes in such a way that in any cycle it scans at least one item 
which was scanned (rewritten) by M in the previous cycle by its (single) rewriting 
instruction. That means that during a computation of M the value of their right 
distances can increase less than k ■ j only. It means that any computation (cycle) 
of M can be simulated by a computation of a det-mon-RRWW-aMtoma,ton Mjk 
with a look-ahead window of the size k ■ j. Since C(det-mon-RRWW) = DCFL 
(see [6]) the theorem is proved. □ 

Theorem 7. DCFL = C(det-mon-RR) C C(det-mon-RL) 

Proof. The fact DCFL = C(det-mon-RR) was proved in [6]. We will prove that 
the language La = LcC Ld, where Lc = {a"6”c | n > 0}, = {a^b“^^d | n > 0} 

is a det-mon-RL-langaa,ge. 

La is recognized by a det-mon-i?L-automaton M which works as follows: 

— M accepts if it scans ((cS, ^d$ or (|:a6c$, otherwise 

— M moves to the right end in order to check whether the input word has 
the form a^6“*"c . If the check is positive then M moves to the “middle” (a 
followed by a different symbol) and deletes ab. If the previous check is not 
successful M moves to the right end in order to check whether the input word 
has the form a'^b'^bd . If the check is positive then M moves to the “middle” 
and deletes abb. If the last check is not successful, M rejects, otherwise it 
restarts. 

On the other hand it was shown e.g. in [6] that La ^ DC F L .T\ds completes 
the proof. □ 

Next we will introduce a sequence of languages serving as witness languages 
in the following text. 

Examples. Let j G Naf let us consider the following alphabet Aj = Eab U 
Scd 0 {co, Cl, ... , Cj_i}, where Sab = {a, b} and S^d = {c, d}. Further, let Lab = 
(Sab ■ Scd)* and Led = Sed ■ (Sab ■ Sed)* . We define a sequence of languages in 
the following way: for each j G Nat let 

Lj = {cqXWCiXW . . . Ci-iXWCiWCi+iW . . . Cj- 2 WCj-iW I (x G Sab J W G Led) 

o<i<j or 

(x G Sed 7 ^ ^ Fab)\’ 

For a word CoruiCit(; 2 C 2 ■ . ■ Cj-iWj from Lj, the words Wi, . . . ,wj are either 
empty words or words in which the symbols from Sab and Sed alternate and their 
last symbol is from Sed- If at least one of the words wi, ... ,Wj is nonempty, then 

Wi = W 2 = . ■ . = Wi = XW Wi+i = Wi+2 = . ■ . = Wj = w 

for some i, 1 < i < j, some symbol x G Sab U Sed and some word w such that 
XW G Lab II Led. 
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Proposition 8. For any j £ Nat: Lj G L(det-j-mon-RL) . 

Proof. We outline j-mon-RL-automa,ton M recognizing Lj. Let u £ (Aj)* be the 
current word in the list of M (between the sentinels). M looks at the first two 
symbols of u, if they equal to: 

— cqCi, then M checks the equality u = coCi...Cj-i. In the positive case M 
accepts, in the negative case M rejects. 

— cox, for some x £ Sab U Scd, then M checks whether u is of the form 



CaXWiCiXW2C2 . . . Ci-iXWiCiWi+iCi+iWi+2 ■ ■ ■ Cj-2Wj-lCj-iWj, 

where either x £ Sat and wi,. . . ,wj G Led or x £ Scd and wi, . . . ,wj G Lab- 
If the result of the check is negative, M rejects. In the positive case, M 
deletes the first symbol after Ci_i (i.e. x) during the corresponding cycle and 
it restarts. 

Any computation of M can be divided into parts in which the same symbol 
X from Sab U Scd is deleted. Each part comprising cycles deleting some symbol 
from Sab can be followed by a part comprising cycles deleting some symbol from 
Scd only, and vice versa. In an accepting computation the first part consists 
of not more than j cycles, all the remaining parts have exactly j cycles. In a 
rejecting computation the first and the last part cannot contain more than j 
cycles, all the remaining parts contain exactly j cycles. 

It is easy to see that any computation (without the tail) can be partitioned 
into (at most) j monotonic subsequences of cycles, simply so that for the first 
subsequence the first cycles are taken from the parts, for the second one the 
second cycles of the parts and so on. This observation proves the claim. □ 

Proposition 9. For any j £ Nat: Lj+i ^ L(j-mon-RLW). 

Proof. Let us suppose that M = {Q, S,^,k,I,qo,QA,QR) is a j-monotonic 
i?LIT-automaton recognizing Lj+i. Let p be the constant determined for M 
by the Fact 5 (pumping lemma). Let us take a word Zq = CoWCiWC 2 ...CjW £ Lj+i 
where w £ Lab and |w| = n > 3kp (fc is the size of the look-ahead window of 
M). 

Let us consider an accepting computation C oi M orr zq. This computation 
contains at least one cycle Cq. Otherwise because of the length of Zq, using Fact 5 
we can construct an accepting tail for a word outside Lj+i . Let the resulting word 
after Cq be Z\. Because G Lj-i-i (C is an accepting computation with the error 
preserving property) the only possible change performed by Co is the deletion of 
the first symbol after the symbol Cj. Therefore, z\ = cowciwc 2 ...wiCj-iwcjw' , 
where w = xw' for some x G Sab. 

We can see that C continues (for similar reasons) by another j cycles Ci,...Cj, 
where Ci deletes the first symbol after cj-i (for 1 < t < j). We can see that 
Dr (Co) < Dr (Cl) < ... < Dr{Cj). That is a contradiction with the assumption 
that M is a j-monotonic iZLIT-automaton. □ 




Two-Way Restarting Automata and J-Monotonicity 



323 



Theorem 10. For any j € Nat: 

C(j-mon-RLW) C C((j+l)-mon-RLW), 

C(det-j-mon-RLW) C C(det-(j+l)-mon-RLW), 

C(j-mon-RL) C C((j+l)-mon-RL), 

C(det-j-mon-RL) C C(det-(j+l)-mon-RL). 

Proof. We can see from the definition of j-monotonicity that the improper inclu- 
sions hold. The proper inclusions follow from the previous two propositions. □ 



Theorem 11. For any j G Nat: 

C(RL) C C(RLW), 

C(j-mon-RL) C C(j-mon-RLW). 

Proof. We can see the improper inclusions. We use the idea of the proof of 
Lemma 4.2 in [1]. Let us take the language 

^1 = { /) ee } • { c”d" |n>0} IJ {g,ee} ■ { c^d"^ | m > 2n > 0 }. 

Li can be recognized by a monotonic i?LW-automaton M in the following way: 

— M immediately accepts the word /, otherwise 

— if the word starts by fc then M simply deletes cd “in the middle” of the 
word and restarts, 

— if the word starts by gc then M deletes odd “in the middle” of the word and 
restarts, 

— if the word starts by gd then M scans the rest of the word. If it contains 
only d's then it accepts otherwise rejects, 

— if the word starts by ee then M nondeterministically rewrites ee by / or g 
and restarts. 

It is easy to see that M is monotonic and L{M) = Li. 

Li cannot be recognized by any i?L-automaton. For a contradiction let us 
suppose Li = L{M) for some i?L-automaton M with the look-ahead of the 
length k. Let us choose (and fix) a sufficiently large n {n > k) s.t. n is divisible 
by p\ (and hence by all p\ < p) where p is taken from Fact 5. Now consider the 
first cycle C of an accepting computation of M on eed^d"'. M can only shorten 
both segments of c’s and d’s in the same way, i.e. eed^d^ =>m eeddf for some 
I < n. (Any accepting computation of M on eed^dP has at least two cycles - 
otherwise using we can construct a word outside L\ which will be accepted by 
M in one cycle) Fact 5. Due to Fact 5, dP can be written d" = viauv 2 auv 3 , 
a = d, u = d^~^, \auv 2 \ < p, where M in the cycle C enters both occurrences of 
a in the same state. 

Recall that nothing is deleted out of auv 2 in C and | auv 2 \ divides n due to our 
choice of n. Then there is some i s.t. = vi(auv 2 Yauv 3 ; hence eec"d^” ^ L(M) 
but (by the natural extending of the cycle C) we surely get eec^d'^^ =^m eec*d"+^ 
where 21 < n + 1 and therefore eec*d"“*'^ G L{M) - a contradiction with the error 
preserving property (Fact 1). □ 




324 



Martin Platek 



4 Additional Remarks 

Let us note that our aim is to present basic formal considerations which should 
serve for a preparation of a software environment performing analysis by reduc- 
tion for Czech or other natural languages like German or Latin. We are deeply 
influenced by our linguistic tradition. Our linguists feel that there is something 
like ’basic word-order’ in Czech sentences on the one hand, and several types of 
relaxation of this basic word-order on the other hand. We feel that the mono- 
tonic computations can represent the analysis by reduction of sentences in the 
basic word-order, the j-monotonic computations can represent the more relaxed 
variants of the word-order. The j-monotonicity allows to formulate ’characteris- 
tic’ restrictions of the relaxation for individual natural languages. Cooperating 
with linguists, we are looking for the adequate ’characteristic’ restrictions for 
Czech. 

We can see that already the second witness language L 2 is not a context-free 
language. These observations and the presented results give us the opportunity 
to consider the j-monotonicity for the degree of non-context-freeness of RL- 
languages, i?LW-languages and for their deterministic variants. 

Let us outline some conjectures and ideas for the future work. We believe that 
we will be able to show a polynomial algorithm for recognition of j-monotonic 
i?LW(kT)-languages. We will be interested in nondeterministic i?LkF-automata 
which keep (completely or by some instructions only) the correctness preserv- 
ing property. This type of consideration is useful for the technology of local- 
ization of syntactic inconsistencies (grammar-checking). We believe that we are 
able to show, that any RLW-, i?i?kF-automaton, which fulfills the correctness 
preserving property, can be transformed into an equivalent deterministic RLW- 
automaton. The results from [7] ensure the fact that there are nondeterminis- 
tic i?i?FF-automata with the correctness preserving property, which cannot be 
transformed into the equivalent deterministic i?i?W-automaton. This observation 
stresses the meaning of the notion of two-way restarting automata for applica- 
tions and theoretical considerations as well. The more general study following 
this direction can lead us to the comparison of the results from [5] with the 
properties of iZLWW-automata. We consider to look for classes of g-systems (see 
[8]) similar to the classes of restarting automata studied here. 

Let us note at the end that there is an essential difference between one- 
way restarting automata and two-way restarting automata: two-way restarting 
automata allow infinite computations and (nondeterministic) cycles of an un- 
limited length. This fact will turn our attention to the study of normal forms of 
i?LlTFF-automata in the near future. 
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Abstract. The paper shows a simple LOGSPACE-reduction from the 
boolean circuit value problem which demonstrates that, on finite labelled 
transition systems, deciding an arbitrary relation which subsumes bisim- 
ulation equivalence and is subsumed by trace preorder is a polynomial- 
time-hard problem (and thus can not be expected to be efficiently par- 
allelizable) . By this, the result of Balcazar, Gabarro and Santha (1992) 
for bisimilarity is substantially extended. 



1 Introduction 

It is not necessary to emphasize the importance of theoretical foundations for 
design, analysis and verification of (complex) systems, especially concurrent sys- 
tems, which are composed from communicating components running in parallel. 
One particular research area studies computational complexity of various verifi- 
cation problems for finite state systems. 

A general model of such systems is given by so called labelled transition 
systems (LTSs for short), which capture the notion of (global) states and their 
changes by performing transitions - which are labelled by actions (or action 
names). Since here we deal only with finite LTSs, they can be viewed as classical 
nondeterministic finite automata. 

We consider the verification problem of testing behavioural equivalences on 
finite LTSs. Let us recall that classical language equivalence turned out to be 
mostly too coarse, and it was bisimilarity which was established as the most ap- 
propriate notion of general behavioural equivalence (cf. [6]). Nevertheless, other 
notions of equivalences (or preorders) turned out to be useful for more specific 
aims. Van Glabbeek [9] classified these equivalences in a hierarchy called lin- 
ear time/branching time spectrum. The diagram in Fig. 1 shows most prominent 
members of the hierarchy and their interrelations (the arrow from i? to S' means 
that equivalence R is finer than equivalence S). As depicted, bisimilarity (i.e., 
bisimulation equivalence) is the finest in the spectrum; the coarsest is trace equiv- 
alence, which is the classical language equivalence when we assume all states as 
accepting (i.e., we are interested in the set of all sequences of actions which are 
performable) . 

* Supported by the Grant Agency of the Gzech Republic, Grant No. 201/00/0400 
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Bisimulation equivalence 



2-nested simulation equivalence 




Fig. 1. The linear time/branching time spectrum 



For the aims of automated verification, a natural research task is to establish 
the complexity of the problem 

Instance: A finite labelled transition system and two of its states, p and q. 

Question: Are p and q equivalent with respect to A ? 
for each equivalence X in the spectrum. From language theory results we can eas- 
ily derive PS PAC E-completeness of trace equivalence. On the other hand, there 
is a polynomial time algorithm for bisimilarity [7,4] . The paper [3] is a (prelimi- 
nary) survey of all results in this area. Loosely speaking, ‘trace-like’ equivalences 
(on the bottom part of the spectrum) turn out to be PSPACE-complete, the 
‘simulation-like’ equivalences (on the top of spectrum) are in PTIME. Balcazar, 
Gabarro and Santha [1] have considered the question of an efficient paralleliza- 
tion of the algorithm for bisimilarity, and they have shown that the problem 
is P-complete (i.e., all polynomial-time problems are reducible to this problem 
by a LOGS PAC E-reduction). This shows that the bisimilarity problem seems to 
be ‘inherently sequential’; we can not get a real gain by parallelization, unless 
NC = PTIME, which is considered to be very unlikely (cf. e.g. [2]). 

Paper [1] shows a (LOGSPACE) reduction from (a special version of) the 
boolean circuit value problem which is well-known to be P-complete. The reduc- 
tion is aiming just at bisimilarity; in particular, it does not show P-hardness of 
other ‘simulation-like’ equivalences (which are known to be in PTIME as well). 

In this paper, we show another reduction from (a less constrained version of) 
circuit value problem which we find simple and elegant, and which immediately 
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shows that deciding an arbitrary relation which subsumes bisimulation equiva- 
lence and is subsumed by trace equivalence (more generally, by trace preorder) 
is P-hard. By this, the result of [1] is substantially extended; though it brings 
nothing new for the (‘trace-like’) equivalences for which PSPACE-hardness has 
been established, it surely is relevant for ‘simulation-like’ equivalences (those 
between bisimulation and simulation equivalences in the spectrum). 

Section 2 gives necessary definitions and formulates the main result, and 
Section 3 contains the technical proof. We then add remarks on a possibility to 
‘lift’ the result to settle a conjecture in [8]. 

2 Definitions 

A labelled transition system (an LT-system for short) is a tuple {S, Act, — >) 
where S' is a set of states, Act is a set of actions (or labels), and — >C S x Actx S 
is a transition relation. We write p — ^ q instead of (p,a,q) € — >■; we also use 
p — ^ q for finite sequences of actions (w € Act*) with the natural meaning. In 
this paper, we only consider finite LT-systems, where both the sets S and Act 
are finite. 

We need precise definitions of trace and bisimulation equivalences on states 
in LT-systems. Let us remark that it is sufficient for us only to relate states 
of the same LT-system; this could be naturally extended for states of different 
LT-systems (we can take disjoint union of these). 

For a state p of an LT-system {S, Act, — >), we define the set of its traces as 
tr{p) = { ru G Act* \ p q for some q G S}. States p and q are trace equivalent, 
iff tr{p) = tr{q); they are in trace preorder iff tr{p) C tr{q). 

A binary relation 7?. C S' x S' on the state set of an LT-system (S, Act, — >) 
is a simulation iff for each {p, q) G TZ and each p — ^ p' there is some q q' 
such that {p',q') G 7^. 7^ is a bisimulation iff both TZ and its inverse TZ~^ are 
simulations. States p, q are bisimulation equivalent (or bisimilar), written p ^ q, 
iff {p, q) G TZ for some bisimulation TZ. 

We recall that a problem P is P-hard if any problem in PTIME can be reduced 
to P by a LOGSPACE reduction; recall that a Turing machine performing such 
a reduction uses work space of size at most O(logn), where n denotes the size 
of the input on a read-only input tape (the output is written on a write-only 
output tape and its size may be polynomial). A problem P is P-complete if P 
is P-hard and P G PTIME. 

Remark. We recall that if a problem P is P-hard then it is unlikely that there 
exists an efficient parallel algorithm deciding P. ‘Efficient’ here means working 
in polylogarithmic time, i.e., with the time complexity in 0(log* n) for some 
constant k, while the number of the processors used is bounded by a polynomial 
in the size n of the input instance. (See e.g. [2] for further details.) 

We say that a relation X (relating states in transition systems) is between 
bisimilarity and trace preorder iff p ~ q implies pXq and pXq implies tr{p) C 
tr{q). And we formulate the main result of our paper: 
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Theorem 1. For any relation X between bisimilarity and trace preorder, the 
following problem is P-hard: 

Instance; A finite labelled transition system and two of its states, p and q. 

Question; Is pXq ? 

We shall prove this in the next section by a LOGSPACE reduction from the 
problem of monotone boolean circuit value, mCVP for short. 

To define mCVP we need some definitions. Monotone boolean circuit is a 
directed, acyclic graph, in which the nodes (also called gates) are either of in- 
degree zero {input gates) or of indegree 2 {non-input gates). There is exactly 
one node of outdegree zero (the output gate). Non-input gates are labelled by 
one of {A,V} (notice that in monotone circuit no -■-gates are used). Input of 
the circuit is an assignment of boolean values (i.e., values from the set {0, 1}) to 
input gates. A value on a non-input gate labelled with A (resp. V) is computed as 
the conjunction (resp. disjunction) of values on its ancestors. The output value 
of the circuit is the value on the output gate. 

The mCVP problem is defined as follows: 

Instance: A monotone boolean circuit with its input. 

Question: Is the output value 1 ? 

The problem is well-known to be P-complete (cf. e.g. [2]). We also recall that 
if Pi is P-hard and P\ is LOGSPACE reducible to P 2 then P 2 is P-hard as well. 

In the next section we show how, given an instance of mCVP, to construct 
(in LOGSPACE) a transition system with two designated states p,q so that if 
the output value of the circuit is 1 then p ^ q, and if the output value is 0 then 
tr{p) 2 tf{l)- So for any relation X between bisimilarity and trace preorder, 
pXq iff the output value is 1. This immediately implies the theorem. 



3 The Reduction 

Let us have an instance of mCVP where the set of gates is V = {1,2, ... ,n}. 
For every non-input gate i, we define l{i), r{i) (left and right ancestor of i) to be 
the gates, such that there are edges from l{i) and r{i) to i (and l{i) yf r{i)). For 
technical reasons we assume the gates are topologically ordered, i.e., for every 
non-input gate i we have i > l{i) > r{i), and n is the output gate (mCVP is still 
P-complete under this assumption). We define a function t : V — (0, 1, A,V}, 
where t{i) denotes the ‘type’ of gate i: 

{ 0 if z is an input gate with value 0 

1 if z is an input gate with value 1 

A if z is a non-input gate labelled with A 

V if z is a non-input gate labelled with V 

Let Vi € (0, 1} denote the actual value on gate z, i.e., if z is an input gate then 
Vi = t{i), and if z is a non-input gate, then Vi is computed from Vi(i),Vj.(i) using 
operation indicated by t{i). 
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We assume, that an input instance of mCVP consists of n and of values 
l{i),r{i) and t{i) for every 1 < t < n (in fact, it suffices that these values can be 
computed from the input instance in LOGSPACE). 

Given an instance of mCVP, we construct LT-system A = {S,Act, — >), 
where Act = {0,1} and S' is a union of the following sets: 

- {p* I 0 < i < n|, 

- {g- I 1 < J < * < n}, 

- \ ^ <k < j <i <n}. 

We organize states in S into levels. Level i (where 0 < i < n) contains all 
states with the same lower index i, i.e., {pi} U {q{ | 1 < j < U |gf ^ | 1 < fc < 

j < i} as depicted in Fig. 2 (already with some transitions). 



level 3 
level 2 
level 1 
level 0 



0,1 



0,1 



i ^ 1 

0 P3 O ® 

0,1 

^P2 Oql 

0,1 

6 Pi O ql 



0,1 



Opo 



Qqi 



0,1 



iql’^ ia,? Oql’^ C 



0,1 



Oqi 



Oql 



^ 3,1 A 3,2 

O (?3’ O ?3’ 



Fig. 2. The states of A organized into levels 



Informally speaking, the intended purpose of states is to ‘test’ whether 
Vj = 1. State qf is viewed as ‘successful’ if indeed Vj = 1 and is ‘unsuccessful’ if 
Vj = 0. Similarly, state q^ is ‘successful’ if Vj = 1 and Vk = 1, and ‘unsuccessful’ 
otherwise. 

The way we construct the transition relation — > will guarantee, that each 
successful state q on level i (i.e., of the form qj or qf'^) is bisimilar with pi, 
and if q (on level i) is unsuccessful, then tr{pi) % tr{q). So p„ and g” are 
two distinguished states announced in the previous section, with the required 
property, that if = 1, then p„ ~ g”, and otherwise tr(p„) ^ ^^(<?")- 

Transition relation — > will contain only transitions going from states on 
level i to states on level i — 1. We will construct transitions level by level, starting 
with transitions going from (states on) level 1 to level 0, then from level 2 to 
level 1 and so on. The actual transitions going from level i to level i—1 will depend 
on t{i), l{i) and r(t), so in this sense level i corresponds to gate i. It is worth 
to emphasize here, that the added transitions does not depend on information, 
whether a state is successful or not. 

Now we describe in detail the construction of transitions leading from states 
on level i to states on level i—1. 
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Firstly, the following transitions are always possible from states on level i 
(see Fig. 2): 

0,1 

Pi — Pi-i, 

qf ql_^ for each j, such that 1 < j < i, 

each j, k, such that 1 < fc < j < z. 

(We write q q' instead of q — ^ q' and q — S- q'.) Depending on t{i), some 
other transitions leading from states on level z may be added. 

To simplify the notation, we need some further definitions. Let Qi be the 
set of all states of the form qj or g-’* on level z, i.e., Qi = {q{ | 1 < j < 
0 bl {qj’ I 1 < fc < j < z}. Let Succ be the set of all successful states, i.e., 
Succ = {qf G S' I Wj = 1} U {qi’^ G S | = 1 and Vk = 1}. 

We use Wi to denote the sequence of actions that correspond to actual values 
on gates z, z — 1, . . . , 1, i.e., zci = vi and Wi = ViWi-i for z > 1. 

We will construct A in such a way, that each level z will satisfy the following 
condition: 

For each q G Qf. if q G Succ, then pi ~ q, otherwise Wi ^ tr{q). (1) 

Notice, that wi ^ tr{q) implies tr{pi) % tr{q), because tr{pi) contains all possible 
traces of length z over Act, so in particular Wi G tr{pi). 

Now we describe the remaining transitions together with a proof that each 
level satisfies the condition (1). We proceed by induction on z. 

Remark. To show for some q £ Qi (such that q ^ Succ) that Wi ^ tr{q), it 
suffices to show for every q' , such that there is a transition q q', that Wi-i ^ 
tr(q'), i.e., to show that q' ^ Succ (because q' ^ Succ implies Wi-i ^ tr(q') by 
induction hypothesis). 



Base of induction (i = 1): Because the circuit is topologically ordered, gate 1 
must be an input gate, so t(l) is either 0 or 1. If t(l) = 0, we do not add any 
transitions (see Fig. 3). Because t(l) = 0 implies Vi = 0, we have q\ ^ Succ. 



Pi 

O 

lft(l) = 0: 

O 

Po 



ql 

o 




Fig. 3. The construction for z = 1 



Obviously zci ^ tr{ql), so the condition (1) holds. 

If t{i) = 1, we add transitions q\ po (see Fig. 3). From t(I) = I follows 
q\ G Succ, and pi ~ q\ is obvious, so again the condition (1) holds. 
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Inductive step (i > 1): 

If t{i) = 0: We do not add any transitions. We first consider qf where i > j (see 
Fig. 4). 



Hi Hi 

o o o 

0,1 0,1 0,1 

O 0. o. ^ 

Pi-1 (ii-1 <ii-i 

Fig. 4. The transitions added it i > j and t{i) G {0, 1, A} 

It is clear, that qf G Succ iff qf_^ £ Succ, so qf satisfies the condition (I), as 
can be easily checked (by induction hypothesis qf_i € Succ implies pi ~ qf-i, 
and qf_^ ^ Succ implies Wi-i ^ tr{qf_^)). The proof for qf'^ , where i > j, is 
similar. 

Now, let q be qf or qf’^, where i = j, i.e., one of q\, qf’’^ . From t{i) = 0 we have 
Vi = 0, and this implies q ^ Succ. Obviously Wi ^ tr{q), because no transitions 
from q are possible. 

If t{i) = 1: If g is qf or qf’^, where i > j, the situation is exactly the same as in 
the previous case (see Fig. 4). So let q be qf or qf’ . We add transitions from q 
as depicted in Fig. 5. 




Pi-1 Qi-i 

Fig. 5. The transitions added if i = j and t{i) = 1 



If q is qf, then surely q £ Succ (because Ui = 1), and obviously pi ~ q. 

If q is qf' , we can imagine this as the situation when it was tested that Vi = 1 
and it is continued with testing that Vk = 1. Because Wj = 1, we have qf’^ £ Succ 
iff qf-i G Succ, so the condition (I) is satisfied in g-’^, as can be easily checked. 

If t{i) = A: If g is qf or qf'^, where i > j, the situation is the same as in two 
previous cases (see Fig. 4). So let g be qf or qf’ . We add transitions from g as 
depicted in Fig. 6. 

If g G Succ, then Vi = I. Because t(i) = A, Vi = 1 implies vpi) = 1 and Vr(i) = 1, 
so G Succ. From this pi-i ~ g*^ 2 by induction hypothesis, so pi — ^ 
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Fig. 6. The transitions added if i = j and t{i) = A 

Pi-i can be matched by q — ^ <7*^1 (and vice versa). Other transitions can 
be matched also (notice that pi-i ~ q^_i if G Succ), so we have pi ~ q. 
Now consider the case q ^ Succ. If q is q\, then Vi = 0, and this implies vpi) = 0 
or Vr(i) = 0, so ^ Succ, and from this we have Wi ^ tr{ql), because 

q\ — >■ q^_{ is the only transition labelled with 0 leading from q\. 

If g is g*’^, then Vi either 0 or I. The case Wj = 0 is similar as before. So let us 
have Vi = 1. Then Vk = 0, and gf_j^ ^ Succ, from which Wi ^ tr{q\’^) follows. 

If t{i) = V: For every q G Qi we add transitions g q\^^\ and g — ^ ll-l- We 
also add transitions Pi — ^ q\^_^\ and pi — ^ g[^*] . 

Let g be q{ where i > j as in Fig. 7. (The case when g is gf^ where i > j is 
almost identical.) If g^ G Succ, then ql_^ G Succ, so by induction hypothesis 




Fig. 7. The transitions added if i > j and t{i) = V 



Pi-i ~ qf-i- From this pi ~ qf easily follows, because every possible transition 
can be matched (for example Pi — ^ q\^^\ by qj — ^ Qi-\t etc.). 

If qf ^ Succ, then qf_^ ^ Succ, so Wi-i ^ tr{qf_i). We need to consider two 

cases, Vi is either I or 0. If Vi = 1, we have wt ^ tr{qf), because qf qf_^ is 
the only possible transition labelled with 1. If Uj = 0, then vpi) = Vr(i) = 0, so 
^ Succ, and from this Wi ^ tr{qf) easily follows. 

Let us now consider the case, where g is g- or g)’^. We add transitions as 
depicted in Fig. 8. 

If g G Succ, then pi — ^ pi_i can be matched by (at least) one of g — ^ Qi-\, 

q — ^ Qi-iJ because from Uj = 1 we have = 1 or Vj.(i) = 1. All other 
transitions are matched as in previous cases, so pi ~ g. 
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Fig. 8. The transitions added if i = j and t{i) = V 



Now suppose q ^ Succ. If q is q\, then Vi = 0, so = Vr(i) = 0, and 

^ Succ. From this Wi ^ tr{ql) easily follows. The case, when q is 
is similar if rij = 0. The only remaining possibility is that Vi = I and = 0. 
Then ^ Succ, so obviously Wi ^ tr{q\'^). 

To finish the proof of Theorem 1, it remains to show that the described 
reduction is in LOGSPACE. 

The algorithm performing the reduction requires only fixed number of vari- 
ables, such as i,j,k, and values of t{i),l{i),r{i) for every i, that are part of the 
read-only input instance of mCVP. In particular, to construct transitions lead- 
ing from a given state, only a fixed number of such values is needed. Obviously, 
O(logn) bits are sufficient to store them. No other information is needed during 
the construction, so the algorithm uses work space of size O(logn). This finishes 
the proof. 

Remark. LT-system A contains 0{v?) states and also O(n^) transitions 
(because the number of possible transitions in every state is in 0(1) and does 
not depend on n). Notice, that a state of the form is not reachable from p„ 
nor g”, if there is no i' G V, such that t(i') = A, l(i') = j and r{i') = k, as can 
be easily proved by induction. There is at most 0(n) pairs j, k, where such i' 
exists, and it is possible to test for given j, k the existence of i' in LOGSPACE, 
so we can add to A only those states qj’^, where such i' exists. In this way we 
can reduce the number of states (and transitions) to O(n^). 



Additional Remarks 

We have considered complexity as a function of the size of given labelled tran- 
sition systems (which describe the state space explicitly). Rabinovich [8] con- 
sidered the problem for concurrent systems of finite agents, measuring complex- 
ity in the size of (descriptions of) such systems; the corresponding (implicitly 
represented) state space is exponential in that size. He conjectured that all re- 
lations between bisimilarity and trace equivalence are EX PT I M E-hard in this 
setting. Laroussinie and Schnoebelen [5] have confirmed the conjecture partially. 
They showed EXPTIM E-hardness for all relations between simulation preorder 
and bisimilarity, and EXPSPAGE-hardness on the ‘trace equivalence end’ of van 
Glabbeek’s spectrum. We plan to explore the probable possibility that our con- 
struction can be ‘lifted’ (i.e., programmed concisely by a concurrent system of 
finite agents), which would settle Rabinovich’s conjecture completely. 
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Abstract. To the metaphors of software engineering and software 
physics can be added that of software geography. We examine the phys- 
ical and economic aspects of the Software Glacier (once an innocent 
bubbling brook, now a vast frozen mass of applications imperceptibly 
shaping both the Hardware Shelf below and User City above), Qnantum 
Planet (colonization of which could be fruitful if and when it becomes 
practical), and Concurrency Frontier (an inaccessible land with rich re- 
sources that we project will be exploited to profound economic effect 
during the next half-century). 



1 The Software Glacier 

It used to be that the computer was the hard thing to build, programming it 
was almost an afterthought. Building a computer was hard because the problem 
was so constrained by the laws of nature. Writing software on the other hand 
was very easy, being unconstrained other than by the size and speed of available 
function units, memory, and storage. 

But as time went by it became harder to write software and easier to build 
computers. 

One might suppose this to be the result of the hardware constraints some- 
how loosening up while the software constraints tightened. However these basic 
constraints on hardware and software have not changed essentially in the past 
half century other than in degree. When hardware design was a vertical oper- 
ation, with complete computers being designed and built by single companies, 
hardware designers had only the laws of nature to contend with. But as the eco- 
nomics of the market place gradually made it a more horizontal operation, with 
different specialists providing different components, the need for specifiable and 
documentable interoperability sharply increased the constraints on the hardware 
designers. 

At the same time clock rates, memory capacity, and storage capacity have 
increased geometrically, doubling every eighteen or so months, with the upshot 
that today one can buy a gigabyte of DRAM and a 60 gigabyte hard drive for 
$150 each. This has enormously reduced the speed and storage constraints on 
software developers. 

So on the basis of how the contraints have evolved, hardware should be harder 
than ever and software easier than ever. 
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Besides constraints, both tasks have become more complex. Hardware in- 
creases in complexity as a result of decreasing line geometries combining with 
increasing die size to permit more circuits to be packed into each chip. This 
has two effects on the complexity of the computer as a whole. On the one hand 
the additional per-chip functionality and capacity increases the overall computer 
complexity. On the other hand it also facilitates a greater degree of integration: 
more of the principal components of the computer can be packed into each chip, 
simplifying the overall system at the board level. 

The complexity of software (but not its utility and effectiveness) increases 
linearly with each of time, number of programmers, and effectiveness of pro- 
gramming tools. There is some attrition as retiring technology obsoletes some 
packages and others receive major overhauls, but mostly software tends to ac- 
cumulate. 

So we have two effects at work here, constraints and complexity. 

Hardware is complex at the chip level, but is getting less complex at the board 
level. At the same time hardware design is so constrained today as to make it 
relatively easy to design computers provided one has the necessary power tools. 
There are so few decisions to make nowadays: just pick the components you need 
from a manageably short list, and decide where to put them. The remaining 
design tasks are forced to a great extent by the design rules constraining device 
interconnection . 

The combined effect of contraints and complexity on hardware then is for 
system design to get easier. Chip design remains more of a challenge, but even 
there the design task is greatly facilitated by tight design rules. Overall, hardware 
design is getting easier. 

Software design on the other hand is by comparison hardly constrained at all. 
One might suppose interoperability to be a constraint, but to date interoperabil- 
ity has been paid only lip service. There is no rigid set of rules for interoperability 
comparable to the many design rules for hardware, and efforts to date to impose 
such rules on software, however well-intentioned, have done little to relieve the 
prevailing anarchy characterizing modern software. 

The dominant constraint with software today is its sheer bulk. This is not 
what the asymptotics of the situation predict, with hardware growing exponen- 
tially and software accumulating linearly. However the quantity of modern soft- 
ware is the result of twenty years of software development, assuming we start 
the clock with the introduction of the personal computer dominating today’s 
computing milieu. Multiply that by the thousands of programmers gainfully or 
otherwise employed to produce that software, then further multiply the result 
by the increase in effectiveness of programming tools, and the result is an ex- 
traordinary volume of software. 

Software today has much in common with a glacier. It is so huge as to be 
beyond human control, even less so than with a glacier, which might at least be 
deflected with a suitable nuclear device — there is no nuclear device for software. 

Software creeps steadily forward as programmers continue to add to its bulk. 
In that respect it differs from a glacier, which is propelled by forces of nature. 
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This is not to say however that the overall progress of software is under voluntary 
human control. In the small it may well be, but in the large there is no over- 
all guiding force. Software evolves by freewill locally, but globally determinism 
prevails. 

Unlike a glacier, software is designed for use. We can nevertheless continue 
the analogy by viewing software users as residents of a city. User City, built on 
the glacier. The users have no control over the speed, direction, or overall shape 
of the glacier but are simply carried along by it. They can however civilize the 
surface, laying in the user interface counterparts of roads, gardens, and foun- 
dations for buildings. The executive branch of this operation is shared between 
the software vendors, with the bulk currently concentrated in Microsoft. The 
research branch is today largely the bailiwick of SIGCHI, the Special Interest 
Group on Computer-Human Interfaces. 

Hardware is to software as a valley is to the glacier on it. There is only one 
glacier per valley, consisting of all the software for the platform constituting that 
valley. In the case of Intel’s Pentium and its x86 clones for example, much of the 
software comes from Microsoft, but there are other sources, most notably these 
days Linux. 

A major breakdown in this analogy is that whereas mortals have even less 
control over the valley under a real glacier than over the glacier itself, the ease 
of designing modern hardware gives us considerable control of what the software 
glacier rides on. We are thus in the odd situation of being able to bring the valley 
to the glacier. 

This is the principle on which David Ditzel founded Transmeta six years 
ago. When Ditzel and his advisor David Patterson worked out the original RISC 
(Reduced Instruction Set Computer) concept in the early 1980s, the thinking 
back then was that one would build a RISC machine which would supplant the 
extant CISC (Complex ditto) platforms. 

The building was done, by Stanford spinoffs Sun Microsystems and MIPS 
among others, but not the supplanting. In a development that academic com- 
puter architects love to hate, Intel’s 8008 architecture evolved through a series 
of CISC machines culminating in the Pentium, a wildly successful machine with 
many RISC features that nevertheless was unable to realize the most impor- 
tant benefit of RISC, namely design simplicity, due to the retention of its CISC 
origins. 

Ditzel’s idea, embarked on in 1995, was to build a Pentium from scratch, 
seemingly in the intellectual-property shelter of IBM’s patent cross-licensing 
arrangements with Intel. The CISC soul of this new machine was to be realized 
entirely in software. Transmeta’s Crusoe is a pure RISC processor that realizes 
the complexity benefits of RISC yet is still able to run the Pentium’s heavily 
CISC instruction set. 

Whether the benefits of a pure RISC design are sufficient to meet the chal- 
lenges of competing head-on with Intel remains to be seen. (The stock market 
seems to be having second thoughts on Transmeta’s prospects, with Transmeta’s 
stock currently trading at $2.50, down from $50 last November.) The point to 
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be noted here is that Ditzel saw the software benefits of going to the glacier. 
Just as bank robber Willie Sutton knew where the money was, Ditzel could see 
where the software was. That insight is independent of whether pure RISC is a 
sound risk. 

The question then naturally arises, must we henceforth go to the glacier, or 
is it not yet too late to start a new software glacier in a different valley? 

Apropos of this question, a new class of processors is emerging to compete 
with the Pentium. The Pentium’s high power requirements make it a good fit for 
desktops and large-screen notebooks where the backlighting power is commensu- 
rate with the CPU power and there is room for a two-hour battery. However the 
much smaller cell phones and personal digital assistants (PDAs) have room for 
only a tiny battery, ruling out the Pentium as a practical CPU choice. In its place 
several low-powered CPUs have emerged, notably Motorola’s MC68328 (Drag- 
onball) as used in the Palm Pilot, the Handspring Visor, and the Sony CLIE; the 
MIPS line of embedded processors, made by MIPS Technologies, Philips, NEC, 
and Toshiba, as used in several brands of HandheldPC and PalmPC including 
the Casio Cassiopeia; the Hitachi SH3 and SH4 as used in the Compaq Aero 8000 
and the Hitachi HPW-600 (and the Sega Dreamcast); and Intel’s Strongarm. 

Software for these is being done essentially from scratch. Although Unix (SGI 
Irix) runs on MIPS, it is too much of a resource hog to be considered seriously for 
today’s lightweight PDAs. The most successful PDA operating system has been 
PalmOS for the Palm Pilot. However Microsoft’s Windows CE, lately dubbed 
Pocket PC, has lately been maturing much faster than PalmOS, and runs on 
most of today’s PDAs. 

Linux has been ported to the same set of platforms on which Windows CE 
runs. The difference between Linux and Windows here is that, whereas Windows 
CE is a new OS from Microsoft, a “Mini-ME” as it were, Linux is Linux. The 
large gap between Windows CE and Windows XP has no counterpart with Linux, 
whose only limitations on small platforms are those imposed by the limited 
resources of small handheld devices. 

This uniformity among PDA platforms gives them much of the feel of a 
single valley. While there is no binary compatibility between them, this is largely 
transparent to the users, who perceive a single glacier running through a single 
valley. 

It is however a small glacier. While Linux is a rapidly growing phenomenon 
it has not yet caught up with Microsoft in sheer volume of x86 software, whence 
any port of Linux to another valley such as the valley of the PDAs lacks the 
impressive volume of the software glacier riding the x86 valley. 

The same is true of Windows CE. Even with Pocket PC 2002 to be released 
a month from this writing, the body of software available for the Windows CE 
platform is still miniscule compared to that for the x86. 

Why not simply port the world’s x86 software to PDA valley? 

The problem is with the question-begging “simply.” It is not simple to port 
software much of which evolved from a more primitive time. (Fortunately for 
Linux users, essentially the whole of Linux evolved under relatively enlightened 
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circumstances, greatly facilitating its reasonably successful port to PDA valley.) 
It would take years to identify and calm down all the new bugs such a port would 
introduce. Furthermore it wouldn’t fit into the limited confines of a PDA. 

Now these confines are those of a 1996 desktop, and certainly there was 
already a large glacier of software for the x86 which fitted comfortably into 
those parameters back then. Why not just port that software? 

The problem with this scenario is that software has gone places in the interim, 
greatly encouraged and even stimulated by the rapidly expanding speed and 
storage capacity of modern desktops and notebooks. To port 1996 software to 
PDA valley would entail rolling back not just the excesses of modern software but 
also half a decade of bug fixes and new features. A glacier cannot be selectively 
torn into its good and bad halves. With the glitzy new accessories of today’s 
software comes geologically recent porcine fat that has been allowed to develop 
unchecked, along with geologically older relics of the software of two or more 
decades ago. Liposuction is not an option with today’s software: the fat is frozen 
into the glacier. 

Nowhere has this descent into dissipation been more visible than with the 
evolution of Windows ME and 2000 from their respective roots in Windows 
3.1 and NT. (The emphasis is on “visible” here: this scenario has played out 
elsewhere, just not as visibly.) Whereas 16 MB was adequate for RAM in 1994, 
now 128 MB is the recommended level. Furthermore the time to boot up and 
power down has increased markedly. 

The one PDA resource that is not currently in short supply, at least for PDAs 
with a PCMCIA slot such as HP-Compaq’s iPAQ, is the hard drive. Toshiba 
and Kingston have been selling a 2 GB Type II (5 mm thick) PCMCIA hard 
drive for several months, and Toshiba has just announced a 5 GB version. Here 
the exponential growth of storage capacity has drawn well clear of the linear 
production of the world’s software, which stands no chance now of ever catching 
up (except perhaps for those very few individuals with a need and budget for 
ten or more major desktop applications on a single PDA). However space on 
a PDA for all one’s vacation movies, or Kmart’s complete assets and accounts 
receivable database, will remain tight for the next two to four years. 

Windows XP, aka NT 6.0, is giving signs of greater sensitivity to these con- 
cerns. XP Embedded is not Windows CE but rather XP stripped for action 
in tight quarters. And Microsoft distributes an impressive library of advice on 
shortening XP boot time under its OnNow initiative. Unfortunately the appli- 
cations a power XP user is likely to run are unlikely to show as much sensitivity 
and shed their recently acquired layers of fat in the near future. 

With these considerations in mind, the prospects for Windows CE/Pocket 
PC look very good for 2002, and perhaps even 2003. But the exponential growth 
of hardware is not going to stop then. We leave it to the reader to speculate on 
the likely PDA hardware and software picture in 2004. 
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2 Quantum Planet 

2.1 Quantum Computation 

Quantum computation (QC) is a very important and currently hot theoretical 
computer science topic. 

The practical significance of QC is less clear, due to a gap of approximately 
two orders of magnitude between the largest quantum module of any kind we 
can build today and the smallest error-correcting modules currently known from 
which arbitrarily large quantum computers can be easily manufactured. This 
gap is presently closing so slowly that is impossible to predict today whether 
technology improvements will accelerate the closing, or unanticipated obstacles 
will emerge to slow it down or even stop it altogether on new fundamental 
grounds that will earn some physicist, quite possibly even one not yet born, a 
Nobel prize. If this gap turns out to be unclosable for any reason, fundamental 
or technological, QC as currently envisaged will remain a theoretical study of 
little more practical importance than recursion theory. 

The prospect of effective QC is as remote as manned flight to distant planets. 
The appropriate metaphorical location for QC then is not a city or even a distant 
country but another planet altogether, suggesting the name Quantum Planet. 

On the upper-bound side, there are depressingly few results in theoretical 
QC that have any real significance at all. The most notable of these is P. Shor’s 
extraordinary quantum polynomial-time factoring algorithm and its implications 
for the security of number-theoretic cryptography [15]. 

There has been some hope for quantum polynomial-time solutions to all 
problems in NP. One approach to testing membership in an NP-complete set 
is via an algorithm that converts any classical (deterministic or probabilistic) 
algorithm for testing membership in any set into a faster quantum algorithm for 
membership in that set. Grover has given a quadratic quantum speedup for any 
such classical algorithm [8]. 

On the lower-bound side, several people have shown that Grover’s speedup is 
optimal. But all that shows about membership in an NP-complete set is that a 
good quantum algorithm based on such an approach has to capitalize somehow 
on the fact that the set is in NP. Any method that works for the more general 
class of sets treatable by Grover’s algorithm cannot solve this problem on its 
own. 

We would all like to see QC turn out to be a major planet. For now it is 
turning out to be just a minor asteroid. I see this as due to the great difficulty 
people have been experiencing in getting other good QC results to compare with 
Shor’s. 

2.2 Quantum Engineering 

There is a saying, be careful what you wish for, you may get it. While com- 
puter scientists are eagerly pursuing quantum computation with its promise of 
exponentially faster factoring, computer engineers are nervously anticipating a 
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different kind of impact of quantum mechanics on computation. At the current 
rate of shrinkage, transistors can expect to reach atomic scale some time to- 
wards the end of this century’s second decade, i.e. before 2020 AD. (So with 
that timetable, as Quantum planets go. Quantum Engineering is Mars to Quan- 
tum Computation’s Neptune.) 

As that scale is approached the assumptions of classical mechanics and classi- 
cal electromagnetism start to break down. Those assumptions depend on the law 
of large numbers, which serves as a sort of information-theoretic shock-absorber 
smoothing out the bumps of the quantum world. When each bit is encoded 
in the state of a single electron, the behavior of that bit turns from classical 
to quantum. Unlike their stable larger cousins, small bits are both fickle and 
inscrutable. 

Fickle. In the large, charge can leak off gradually but it does not change 
dramatically (unless zapped by an energetic cosmic ray). In the small however, 
electrons are fickle: an electron can change state dramatically with no external 
encouragement. The random ticks of a Geiger counter, and the mechanism by 
which a charged particle can tunnel through what classically would have been 
an impenetrable electrostatic shield, are among the better known instances of 
this random behavior. 

Inscrutable. Large devices behave reasonably under observation. A measure- 
ment may perturb the state of the device, but the information gleaned from the 
measurement can then be used to restore the device to prior state. For small 
devices this situation paradoxically reverses itself. Immediately after observing 
the state of a sufficiently small device, one can say with confidence that it is 
currently in the state it was observed to be in. What one cannot guarantee how- 
ever is that the device was in that state before the measurement. The annoying 
thing is that the measurement process causes the device to first change state at 
random (albeit with a known contingent probability) and then to report not the 
old state but the new! The old state is thereby lost and cannot be reconstructed 
with any reasonable reliability. So while we have reliable reporting of current 
state, we do not have reliable memory of prior state. 

So while quantum computation focuses on the opportunities presented by 
the strangeness of the quantum world, the focus of quantum engineering is on 
its challenges. Whereas quantum computation promises an exponential speedup 
for a few problems, quantum engineering promises to undermine the reliability 
of computation. 

Quantum engineering per se is by no means novel to electrical engineers. 
Various quantum effects have been taken advantage of over the years, such as 
Goto’s tunnel diode [6], the Josephson junction [9], and the quantum version of 
the Hall effect [16]. However these devices achieve their reliability via the law of 
large numbers, as with classical devices, while making quantum mechanics work 
to the engineer’s advantage. Increasing bit density to the point where there are 
as many bits as particles makes quantum mechanics the adversary, and as such 
very much a novelty for electrical engineers. 
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3 Concurrency Frontier 

Parallel computation is alive and well in cyberspace. With hundreds of millions 
of computers already on the Internet and millions more joining them every day, 
any two of which can communicate with each other, it is clear that cyberspace 
is already a highly parallel universe. 

It is however not a civilized universe. Concurrency has simply emerged as 
a force of cybernature to be reckoned with. In that sense concurrency remains 
very much a frontier territory, waiting to be brought under control so that it can 
be more efficiently exploited. 

Cyberspace is still very much a MIMD world. Multiple Instructions operating 
on Multiple Data elements. It is clearly not a SIMD world, the science fiction 
scenario of a single central intelligent agent controlling an army of distributed 
agents. 

Nevertheless some SIMD elements are starting to emerge, in which the In- 
ternet’s computers act in concert in response to some stimulus. The Y2K threat 
was supposed to be one of these, with all the computers in a given time zone 
misbehaving at the instant the year rolled over to 2000 in that time zone. 

The fact that most computers today in the size spectrum between notebooks 
and desktops run Microsoft Windows has facilitated the development of com- 
puter viruses and worms. A particularly virulent worm is the recent W32.Sircam, 
which most computer users will by now have received, probably many times, as 
a message beginning “I send you this file in order to have your advice.” If it 
takes up residence on a host, this worm goes into virus mode under various 
circumstances, one of which is the host’s date being October 16. In this mode 
it deletes the C: drive and/or fills available space by indefinitely growing a file 
in the Recycled directory. As of this writing (September) it remains to be seen 
whether this instance of synchronicity will trigger a larger cyberquake than the 
mere occurrence of the year 2000. 

More constructive applications of the SIMD principle have yet to make any 
substantial impact on the computing milieu. However as the shrinking of de- 
vice geometries continues on down to the above-mentioned quantum level and 
sequential computation becomes much harder to speed up, computer architects 
will find themselves increasing the priority of massively parallel computation, 
not for just a handful of CPUs but for thousands and even millions of devices 
acting in concert. 

Thinking Machines Corporation’s CM-2 computer was an ambitious SIMD 
machine with up to 64K processors. Going by processor count, this is impressive 
even today: then it yielded a local bus bandwidth of 40 GB/s, and with today’s 
faster buses could be expected to move terabytes per second locally. However 
main memory was 0.5 GB while hard drive storage ran to 10 GB, capacity that 
today can be matched by any hobbyist for $150. The blazing speed was definitely 
the thing. 

SIMD shows up on a smaller scale in Intel’s 64-bit-parallel MMX architecture 
and 128-bit SSE architecture, as well as in AMD’s 128-bit 3DNow architecture. 
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However this degree of parallelism yields only small incremental gains sufficient 
to edge ahead of the competition in the ongoing speed wars. 

As some of the more fundamental bottlenecks start showing up, evasive ma- 
neuvers will become more necessary. A return to CM-2 scale parallelism and 
higher is certainly one approach. 

A major difficulty encountered at Thinking Machines was how to describe 
the coordinated movement and transformation of data in such highly parallel 
machines. 

Rose and Steele [14] and Blelloch and Greiner [1] came up with some novel 
programming language approaches to Thinking Machines’ problem. What struck 
me about these approaches was how hard it is to separate ourselves from the 
sequential modes of thought that pervade our perception of the universe. It 
is as though we cannot see pure concurrency. If we were ever confronted with 
it we simply would not recognize it as computation, since it would lack those 
sequential characteristics that characterize certain behaviors for us as essentially 
computational, in particular events laid out along a time line. 

Programmers who would like to leave a useful legacy to their heirs need to 
spend more time reflecting on the nature of concurrency, learning what it feels 
like and how to control it. My own view of concurrency is that event struc- 
tures [10,17,18] offer a good balance of abstractness and comprehensiveness in 
modeling concurrency. 

The extension of event structures to Chu spaces, or couples as I have started 
calling them [12], simultaneously enriches the comprehensiveness while cleaning 
up the model to the point where it matches up to to Girard’s linear logic [4]. 
The match-up is uncannily accurate [2] given that linear logic was not intended 
at all as a process algebra but as a structuring of Gentzen-style proof theory. 

There is another ostensibly altogether different approach to the essence of 
concurrency, that of higher dimensional automata [11,5,3,7]. It is however pos- 
sible to reconcile this approach with the Ghu space approach by working with 
couples over 3, that is, a three-letter alphabet for the basic event states of before, 
during, and after [13]. It is my belief that the three-way combination of duality- 
based couples, geometry-based higher-dimensional automata, and logic-based 
linear logic, provides a mathematically richer yet simpler view of concurrency 
than any other approach. 

The economic promise of Goncurrency Frontier is its potential for further 
extending the power of computers when the limits set by the speed of light and 
quantum uncertainty bring the development of sequential computation to a halt. 
I do not believe that the existing sequentiality-based views of concurrency permit 
this extension, and that instead we need to start understanding the interaction 
of complex systems in terms of the sorts of operations physicists use to combine 
Hilbert spaces, in particular tensor product and direct sum. The counterparts 
of these operations for concurrency are respectively orthocurrence A® B and 
concurrence A\\B (also A © B), or tensor and plus as they are called in linear 
logic. Gloser study of this point of view will be well rewarded. 
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